View Full Version : Logs question: 302 Hits from Wikipedia (5-10 hits per second)
JRepici
06-01-2007, 01:40 PM
Hi,
I think this happened last year too.
Wikipedia links to my article on CSV, and recently they generated a lot of referral hits to the article that messed up my stats chart views.
Here's a cut and past snippet from the log of 29-May-2007:
May/2007:15:29:14 -0400] "GET /Doc/Articles/CSV/CSV01.htm HTTP/1.1" 302 314
May/2007:15:29:14 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
May/2007:15:29:14 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
May/2007:15:29:14 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
May/2007:15:29:14 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
May/2007:15:29:14 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
May/2007:15:29:14 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
May/2007:15:29:15 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
May/2007:15:29:15 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
May/2007:15:29:15 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
May/2007:15:29:15 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
May/2007:15:29:15 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
May/2007:15:29:15 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
May/2007:15:29:15 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
May/2007:15:29:15 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
May/2007:15:29:16 -0400] "GET /Site/Misc/IdxDflt.htm HTTP/1.1" 302 314 "htt
I showed very little here. This goes on for a while. Also, I only showed a small segment of each entry (the part with the timing and the errors).
The referral seemed to start with an attempt to go to the article (first line above). This should have worked, but instead it produced a 302. Upon getting a 302 for the article it went to my default index page (My htaccess goes to this page when somebody tries to access a directory without an index). The attempts to access the default index goes on for a VERY long time. I have almost 400,000 302 hits for the month of May.
As far as I know the article has never been unavailable. What causes this?
Thank you for any help you can render.
-djr
JRepici
07-14-2007, 07:26 PM
Hi,
I hope someone can help with this.
Now, the same article is causing the same symptoms when being accessed from a google query.
What am I doing wrong?
P/1.1" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
GetTerm=World+Wide+Lexicon HTTP/1.1" 200 5521 "-" "Mozilla/5.0 (compatible; Goo
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
dit=authentication HTTP/1.0" 200 4083 "-" "Mozilla/5.0 (compatible; Yahoo! Slur
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
ssary.pl?Edit=hyperword HTTP/1.0" 200 4815 "-" "Mozilla/5.0 (compatible; Yahoo!
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
" 302 314 "http://www.google.com.vn/search?hl=vi&q=csv+with++cobol&btnG=T%C3%AC
The dates on these were around:
14/Jul/2007:02:59:58 -0400
With multiple hits per second.
The ip address on the first one and all the follow-ups was:
203.160.1.43
Which resolved to localhost ? (this is not a private IP address)
(nslookup timed out on the query)
Thanks if you have any suggestions.
-djr
Wassercrats
07-14-2007, 07:53 PM
I get 200 ok for that page with wget from my Futurequest account.
Blocking the referrer's IP may work.
JRepici
07-23-2007, 12:28 PM
Wassercrats,
re:
I get 200 ok for that page with wget from my Futurequest account.
Thanks for your reply. The problem with this is that it seems to happen VERY intermittently. Once every month or two. When it does happen, however, it is a deluge, so it spikes my stats and makes the graphs much less useful. I imagine the multiple hits per a second also fill bandwidth and reduce other people's ability to reach the article and site.
re:
Blocking the referrer's IP may work.
I'm not sure I'd ever be ready to take the big leap and block all traffic referred into my site from google and wickerpedia in order to block these few "visitors" who are flooding the servers with hits traffic.
Are there any other suggestions regarding this occasional flood of accesses? I have the IP addresses of the visitors, but they seem to be DHCP'd. My big question is: Where do I go to register a complaint against this kind of behavior? Is there a complaint process at the FCC that is similar to the forms available at the FTC?
Thanks again for your help.
-djr
JRepici
10-08-2007, 04:09 PM
Hi all,
Well, this problem happened again on the 5th (10-2007) and I took a little time today to really analyze the logs.
I found the problem! It is sort-of "my-bad". It is caused, not by a lack of blocking but BECAUSE of blocking.
The Problem:
I have a company's IP address-range blocked using a deny from ##.##.##. in the '.htaccess' file*. I also use the -Indexes trick to redirect people to a default index when they try to access a directory listing (where there is no index.*htm* file).
So, in the .htaccess file, there's this:
Options -Indexes
ErrorDocument 403 http://www.MySite.com/Site/Misc/IdxDflt.htm
That works great except that it generates the same error code (403) as a deny from**.
SO: When they try to access my content they get a 403: You don't have access. That in turn redirects to the IdxDflt.htm file ON THE SAME SITE.
That generates another 403 - Which redirects to the same IdxDflt.htm file, which generates another redirect, which generates another 403, ad-infinitum.
My Solution:
For the time being, my quick and dirty solution is to simply redirect 403's to a default index page off the site.
i.e.:
Options -Indexes
ErrorDocument 403 http://www.MyOtherSite.com/Site/Misc/IdxDflt.htm
As mentioned, this is Q&D until I get some bench-time to learn more about the htaccess modrewrite stuff (not today). Of course, I'd be grateful for any suggestions of better solutions.
Hope this helps.
-cvst
* It is a long story, they basically copied my CSV article (http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm) and sold it to the Department of Labor without attribution. The DoL, not knowing they were using stolen property then published it on the web. I fixed the problem with the DoL, and then denied the very large contractor (#caughSRA#caugh) from viewing other site content that they may also wish to take and sell (nothing like an infinite markup to spruce up your earnings reports).
** In essence, when somebody goes to a directory without an index it assumes they are trying to view the directory listing and returns a 403 to say that they don't have access to do that.
garyamort
10-10-2007, 07:12 AM
Hi all,
The Problem:
I have a company's IP address-range blocked using a deny from ##.##.##. in the '.htaccess' file*. I also use the -Indexes trick to redirect people to a default index when they try to access a directory listing (where there is no index.*htm* file).
Why not simply redirect all requests from that range to a "not allowed" page and give them a contact form to use to contact you and request an offline copy of the file(for a fee...like, oh, say 1 million dollars)
JRepici
10-10-2007, 01:02 PM
Why not simply redirect all requests from that range to a "not allowed" page and give them a contact form to use to contact you and request an offline copy of the file(for a fee...like, oh, say 1 million dollars)
garyamort,
Hehehe. Pinky finger to corner of mouth... :)
Great idea! Then I would not only have a range of IP addresses, I may actually start building a contact list of their company.
-John
sheila
10-11-2007, 09:50 PM
What you could do (possibly) is create a dynamic "Default Index" file (say in PHP) that tests for whether the IP address is in a particular list (blocked by you). If it is, display on the Default Index page the appropriate content that you want the blocked parties to see. Otherwise, display the other content that you want viewed when there is no index file and the IP is not blocked.
The reason you had this problem was the loop in your situation.
blocked IP -> 403 error -> 403 default index page -> repeat ad infinitum for blocked IP
To remove the loop, have your 403 document dynamically display the content you want based on whether the IP is "blocked" or not.
Then you have
blocked IP -> 403 error -> default index which displays appropriate content. End of story.
Anyhow, it avoids having to mess with mod_rewrite, and if you're comfortable whipping up a quick and simple PHP page this should solve the matter.
JRepici
10-12-2007, 11:21 AM
Sheila,
What am I missing?
1. Denied_IP_Requests_ANY_Page
2. -> 403 Redirects_To_Default_Page
3. -> Denied_IP_Requests_Default_Page_From_Site
4. Go To #2
As long as the page the 403 redirects to is on the same site where the "deny from" is in force, there will be a problem, no?
-cvst
sheila
10-12-2007, 02:49 PM
Hrmm. Now I'm confused. But I dealt with an extremely similar situation several years back:
http://www.aota.net/forums/showthread.php?t=2332&highlight=error+loop%2Al
and found a solution to it, that I think was similar to what I described above. Here, you can read for yourself:
http://www.aota.net/forums/showthread.php?p=50998&highlight=error+loop%2A#post50998
Just now I was trying to test what I suggested last night, but for some reason I'm putting a deny directive for my IP address in the .htaccess for one of my sites and it isn't blocking me at all. I'm confused and now I've spent a bit of time on it and it still isn't doing what I want. Grrr. I hope the links above help you...
vBulletin® v3.6.8, Copyright ©2000-2009, Jelsoft Enterprises Ltd.