View Full Version : Log Spammers: Server IP Table Blocking
This is an open question for FQ Staff:
Despite my logs never having been available for public viewing, a single automated log spammer picked up on our site in the last couple of weeks, and has been blitzing us. They use various referring urls, mostly marketing rubbish. And it appears they are paid to do this kind of 'promotion'. (Their aim is to gain Google PR for the target site, if your logs are publically viewable and crawlable. So it is a total waste of FQ server resources, as well as disfiguring your access logs.)
Plea to Responsible FQ Citizens: Double check that your logs are password protected, and so NOT publically viewable
I have been adding 'deny from IPs' as they switch to new ones to my htaccess at the rate of 2-3 a day.
If they are a proxy, they deserve to be blocked for allowing spammers to use them.
Please FQ Staff, can you do something? Blocking log spammers like this still puts a strain of the FQ servers, and it's a problem that is set to incREASE, especially if ignored.
In addition to wasting CPU resources for FQ, blocking them like this inevitably slows loading of our site.
Could you consider setting up an IP block table (or automatically block the spammy referring urls) at Server config level?
This list is just from the last two weeks...
Check my access logs or ask me if you want the urls.
deny from 67.120.225.185
deny from 64.74.180.128
deny from 81.18.42.114
deny from 213.136.114.105
deny from 24.10.61.187
deny from 61.135.131.173
deny from 203.82.48.55
deny from 204.50.51.117
deny from 65.18.203.170
deny from 80.179.101.
deny from 4.250.102.107
deny from 65.60.198.177
deny from 67.106.152.131
deny from 66.14.134.18
deny from 203.160.252.
deny from 67.18.209.2
deny from 64.62.136.200
deny from 147.83.59.153
deny from 65.50.141.2
deny from 4.156.201.138
deny from 68.109.168.177
deny from 24.147.153.209
deny from 24.222.225.52
deny from 64.152.195.37
deny from 67.185.126.55
deny from 24.76.176.216
deny from 198.87.3.70
deny from 82.3.32.74
deny from 4.156.201.175
deny from 65.103.136.43
deny from 70.96.220.160
deny from 65.191.64.167
deny from 218.85.23.209
deny from 68.61.235.225
deny from 67.20.15.91
deny from 84.43.91.232
deny from 70.176.66.170
deny from 24.215.120.231
deny from 84.43.101.28
deny from 218.6.63.144
deny from 148.223.216.169
deny from 84.43.91.66
deny from 70.183.179.39
deny from 204.10.38.66
deny from 67.170.247.179
deny from 64.122.98.114
deny from 67.161.128.118
deny from 69.50.167.66
deny from 207.255.100.204
deny from 24.17.87.195
deny from 193.95.112.71
deny from 216.104.196.225
deny from 194.247.47.81
deny from 163.28.48.69
deny from 148.244.150.52
deny from 148.223.216.169
Two more new log spamming IPs added to my list today:
deny from 163.28.48.73
deny from 201.144.74.160
Thanks to FQ Staff popping a log scanning script in my account to monitor this. Let me know if I can help in any way.
I'd like to be able to remove this rapidly growing list of IPs denied from my htaccess file as soon as they are blocked at server config level.
Could you let me know when please?
Ta!
Colin
Terra
07-11-2005, 12:07 AM
Plea to Responsible FQ Citizens: Double check that your logs are password protected, and so NOT publically viewable
Very good advice that warrants being mentioned again! :yeah:
I have been adding 'deny from IPs' as they switch to new ones to my htaccess at the rate of 2-3 a day.
To get ahead of myself, multiple that by over 10,000 domains with several million referrers daily... You will then see the tide quickly begins to turn...
Please FQ Staff, can you do something? Blocking log spammers like this still puts a strain of the FQ servers, and it's a problem that is set to incREASE, especially if ignored.
It is definitely not something ignored by us...
Unfortunately, your request (plea) falls into the infamous realm of "the referrer spam blocking concept sounds simple, however the implementation is dreadfully difficult and computationally expensive"
In addition to wasting CPU resources for FQ, blocking them like this inevitably slows loading of our site.
Sadly though, the cure will end up being more expensive than the disease... :(
There are only three real ways to block Referrer Spam:
1) IP based block
2) Referrer string block
3) IP + Referrer tuple
I wrote a very simple PoC perl program, that took the IP addresses you listed, and pulled out all the Referrers associated with those IPs...
For reference, look at the following URL:
http://www.FutureQuest.net/referrer_spam/
1) rspam.lst == your listed IPs
2) scan_rspam.pl == scanning program, given list of IPs, will retrieve the referrers and track statistics
3) rspam.txt == the result of the scan
rspam.txt was created from the following execution:
$ zgrep -h "." access.* | ./scan_rspam.pl rspam.lst > rspam.txt
Just looking at the 'Ratio' states that well less than 1% (0.0038) of all requests were referrer spam...
In contrast, if I must scan each and every request, then it becomes computationally expensive and penalizes 100% of every request regardless if it is valid or may contain referrer spam...
Just in your short list of IPs, there were 2750 entries... Now multiply that by over 10,000 domains and you can see where this is going...
Also of concern, is given that there are only three ways to block this type of spam, the incident of false positives runs pretty high as I see several referrers you would not want blocked normally:
1) google
2) dmoz
3) msn
4) yahoo
5) insert any false positive here
This alone makes automation difficult, and implementing some sort of manual corpus would also become a tedious daily process if we would have to sort through thousands of potential candidates and vet them...
Could you consider setting up an IP block table (or automatically block the spammy referring urls) at Server config level?
As they say, one persons garbage may be another persons treasure...
Any type of unilateral decision must be very very carefully weighed...
In conclusion, I hope that I have explained the difficulty of handling referrer spam at the server level... There is sadly no 'one-size-fits-all' solution, therefore whatever solution is devised must be scoped to the individual domain...
There is really nothing that I can find that is specifically written to combat this at the Apache level, meaning that we would have to develop our own custom module, along with writing a CNC module to interface with it...
Even thus, with a custom written module and GUI, there is still the problem of taxing 100% of all requests to block a very small 0.0038 subset (while looking at the much larger picture)... This is what primarily holds me back from implementing your request... I am extremely resistant to bloat...
Also note, that you can block either by Referer [sic] or IP within your .htaccess file... Please view the following thread:
http://www.aota.net/forums/showthread.php?postid=134490#post134490
--
Terra
--the best step is to write your Senators and Congressmen and demand that they make (obvious) spamming illegal--
FutureQuest
Thanks Terra,
I certainly do appreciate your dilemma, and your quick response.
Rather you than me Mate :)
Please don't take this as criticism, but I just looked at the scan results data of your new script and it missed several spam referers. I think if you check my last two weeks' logs for that IP list, you'll discover it is more than 1%.
And anyway, it only needs one Plamodium infected mosquito bite to infect your whole body.
That is why I'm blocking them as soon as I've noticed. In the hope that their automated spam generator will see the 403's and move on. But I posted this here, to alert everyone else otherwise we all suffer at FQ.
I know that I can block on refering url, but the url changes quicker than the IP, so I chose IP.
You'll come up with something that avoids bloat, I jus' know you will :)
Colin
Terra
07-11-2005, 12:47 AM
Please don't take this as criticism, but I just looked at the scan results data of your new script and it missed several spam referers. I think if you check my last two weeks' logs for that IP list, you'll discover it is more than 1%.
I ran all of June and July 2005 logs through the scan_rspam.pl script...
If you look at the script, you will see that I maintain the $referrers{$referrer_short}->{'COUNT'} counter, which is incremented based on successfully finding one of the IPs in your list... Then in the 'print' loop, each 'COUNT' is added to the '$total' variable...
As a quick sanity check, I ran the following:
$ zgrep "." access.* | fgrep -f rspam.lst | wc -l
2751
Which pretty much matches the total number that the scan_rspam.pl script found (2750)...
If I have made a logic or programming error somewhere, please let me know and I'll update the script... I'm just not seeing it... :(
--
Terra
--"Just before they went into warp, I beamed the whole kit and kaboodle into their engine room, where they'll be no tribble at all." (Scotty)--
FutureQuest
'Pologies Terra for implying your scripting was dodgy. I defer...
Which pretty much matches the total number that the scan_rspam.pl script found (2750)...
But 2750 calls in two weeks, Aggghh! And this despite daily blocking. It would have been much more if I hadn't been stamping on them.
I'm convinced you'll have a 'eureka moment' soon.
Thanks and Happy Ablutions :)
Colin
Terra
07-11-2005, 02:36 AM
Please don't take this as criticism,
None taken... :)
I went ahead and ran further tests to see if this was a widespread problem, and used the ROCKO server as ground zero...
I checked every account's web logs (on ROCKO including yours) against your IP block list via the following method for the last 2 weeks (6/25 - 7/9):
# for i in x*/logs_web/access.2005{06{2[56789],30},07*}.gz;do echo $i >/dev/stderr; zcat $i;done | fgrep -c -f rspam.lst
4090 (this includes your site's 1462)
Then I obtained the total number of log entries for that same time period
# for i in x*/logs_web/access.2005{06{2[56789],30},07*}.gz;do echo $i >/dev/stderr; zcat $i;done | wc -l
21,042,049
Now we find the ratio of hits to misses:
4,090 / 21,042,049 == 0.000194372705813963
In retrospect, I would need to penalize 5145 requests to find 1 referrer spam (on average) just on ROCKO alone...
--
Terra
--nothing quite invigorates like having to work around an 'Argument list too long' error--
FutureQuest
There are only three real ways to block Referrer Spam:
1) IP based block
2) Referrer string block
3) IP + Referrer tupleWould it be possible to identify a Referrer Spam attack at the network/firewall level (same IP - changing referrer) and age block the IP as you do with SSH Guardian?
But would you call it Referrer Guardian or Referer Guardian?
--
Don
Terra
07-11-2005, 08:38 AM
Would it be possible to identify a Referrer Spam attack at the network/firewall level
Egad, definitely not... :confuz:
Having to deal with packet reassembly and doing string matches into raw packets, not to mention having to first parse it into a HTTP request... Last thing you want to do is Layer 7 filtering with a Layer 3 firewall...
Firewalls are best left to handling the Layer 3 components/structure of a TCP/IP packet, and not the actual payload itself...
(same IP - changing referrer)
That isn't a good criteria, because referrers are somewhat volatile in nature... It would take a fair amount of A.I. heuristics to determine if a changing referrer is valid or spam... The best that can be done is hashed pattern matching against a defined list, and even that is problematic for high transaction rates due to the scanning overhead... Once again this leads to a 'whack-a-mole' situation as you would have to define and maintain the lists specifically for your site...
But would you call it Referrer Guardian or Referer Guardian?
Not sure, Project Names usually hit me out of nowhere when I least expect it... As it stands right now, this requested project can be considered stalled since I cannot justify the performance hit and development time based on such a low hit ratio as explained above...
But alas, it is an interesting problem, and I'm sure the gerbil wheel in the back of my mind will be spinning from time to time... :safegrin:
--
Terra
--my brain GPF'd just thinking about it--
FutureQuest
I suspect that if each of us being plagued by the problem submitted our lists (I'm not doing IP blocking however, but rather the regex method outlined in the other thread Terra referred to) the sample size Terra got would be much larger.
This has been a horrible ongoing problem for us, so much so that our logs are now useless. They never were available in the clear--they've always been password protected. We get 800-900 referrer hits of spam a day versus our 1500-1600 real visitors a day.
Betsy
In retrospect, I would need to penalize 5145 requests to find 1 referrer spam (on average) just on ROCKO alone...
As it stands right now, this requested project can be considered stalled since I cannot justify the performance hit and development time based on such a low hit ratio as explained above...
But alas, it is an interesting problem, and I'm sure the gerbil wheel in the back of my mind will be spinning from time to time...
Roge! More than adequate answer. Thank you for your diligence.
So we have to put up with it, and watch it grow to the point that the size of the problem does justify intervention.
Is there any way to prevent the junk appearing in our logs by editing the stats log config file.
If so could you give sample lines of code to block an IP or URL?
Something for us to use as a template.
Ignore this request if it is our remit.
I enjoy studying my logs, but not crammed with filthy domain names and marketing spam.
It doesn't just 'wind-up' Rocko's CPU.
Colin
PaulKroll
07-11-2005, 02:51 PM
How about a CNC feature, that lets you check "this referrer was spam" and then it adds the IP address to your root .htaccess deny list, with a comment to denote date saved for later removal? Periodically, a removal script runs and deletes X-week-old denies.
That'd offer up the possibility of a future CNC feature, "trust other's deny lists" which would apply the blocks to any domain that wanted to take "the easy way out" and could accept the chance of false positives. (Maybe helped along by "only block an entry if it appears in three or more accounts deny lists") (Accounts and not sites, so as to avoid one person's personal reasons for blocking an IP from many of their sites)
I'm not at all demanding. Also, I want a pony. OH! And some peeps! You know, the little candy birds that... (realizes everyone's looking at him) Nevermind.
How about a CNC feature, that lets you check "this referrer was spam" and then it adds the IP address to your root .htaccess deny list, with a comment to denote date saved for later removal? Periodically, a removal script runs and deletes X-week-old denies.
Nice idea, but I suspect Terra would say that there are more pressing requirements for the CNC, then there's CNC 'bloat', and FQ Staff having to explain to users, 'What a referrer is'...
Until it grows, I think doing the above manually is the only realistic option.
Can't stop them hitting the server, but it'd be good to stop their hits appearing in our Stats Logs, so if anybody does know how to safely edit the Stats Log config files to edit out an IP or referring url, please tell us.
Colin
sheila
07-12-2005, 03:20 AM
f anybody does know how to safely edit the Stats Log config files to edit out an IP or referring url, please tell us.
Site owners do not have access or permissions that would allow them to edit the stats config files, or to edit the stats files themselves.
The best thing to do in a case like this, would be to process the stats files yourself, so that you can tell the tool that you are using to ignore whatever domains, IP address, and the like that you do not want to see in your stats results.
A month or so ago I had a service desk ticket open about this --kudos btw to FQ folks who sensed my pure frustration regarding the RS problem and didn't react with the same frustration I was experiencing. I hope Bob doesn't mind if I post the message he wrote me regarding it which included a bit about how to filter the logs. I started to implement it, and got side-tracked/confused (I don't remember anymore) and never put it into place. I have a few days off and perhaps I'll allocate some time to do it this week and am happy to report back but would encourage you to also give it a shot, Colin, and let us know how it turns out for you.
Hello Betsy,
We certainly agree that no one would want these entries in their logs however from a quick look it appears that over 100 IP addresses are listed from recent logs which would indicate a large spambot network was being used and most likely changing relatively often which makes any type of firewalling at the network edge virtually impossible.
From what I have read in regards to this type of spam is that it is only useful to the spammer if portions of your stats are made publicly available. Making sure no portion of your sites stats or referrer logs are publicly available is your best defense in the long run.
It appears that there may be some more automatic methods to clean these and add entries to your .htaccess file. I found this with a simple search on Google:
http://g-blog.net/user/Gossip/entry/15472 (http://g-blog.net/user/Gossip/entry/15472)
How effective or non intrusive the solution above is I would not know as this has not been tested by me however it may be an area to search for additional solutions.
-Bob
=============
Bob Johnson
Service Rep
FutureQuest, Inc.
http://www.FutureQuest.net (http://www.futurequest.net/)
Thanks Sheila, I'll dig out a stats analysis tool and have a go, as you suggest.
I did actually download the FQ Stats config file, it contains lots of helpful notes, but I was out of my depth.
Going by what you say, even if I'd edited it, and uploaded it to my account, it would not have run anyway.
TVB (and Bob): Thanks for the input, but I wouldn't trust any script made by somebody who uses language like that on the net. And I don't mean Perl or Cobol.
Colin
Having a look at awstats.sourceforge.net I spot this interesting stats config modification:
SkipHosts="123.123.123.123"
I'm tempted to try that in my FQ Stats config file...
But I won't.
sheila
07-12-2005, 04:20 PM
Having a look at awstats.sourceforge.net I spot this interesting stats config modification:
SkipHosts="123.123.123.123"
I'm tempted to try that in my FQ Stats config file...
But I won't.
Do you have awstats installed on your site? I'm a bit confused by the stuff you've written. When you wrote "FQ Stats config file" ... that made me think of some server-specific file that FutureQuest sysadmins would put in place. But if you are actually referring to a config file for a 3rd party script, such as awstats, then you would certainly be able to modify that to your liking. That's the whole point of 3rd party scripts!
Well, if I've misunderstood, what can I say but oops!
No, you were correct Sheila, I only have the default FQ stats on my site.
What I was implying, 'tongue-in-cheek', was that that line of code might 'work' if I put it in the FQ stats config file.
Sorry to confuse you.
I'm still looking for a third party stats package.
One that you don't need to be a geek to install and use... and has the F*** word I do like... FREE! :)
Anybody else using a suitable 3rd. party stats analysis tool? (ie. Free and allows IP /referring url filtering).
Colin
Colin,
It might be worth getting over your distaste of the word if it works. The person who made the program was probably frustrated with the problem and it was the first thing that came to mind. Heck, I dislike the word coke but I still drink it.
I bet Jesus used the F word too, which should be reassuring to you.
Betsy
--makes full use of the english language whenever necessary
Source error TVB.
Cursing is sin, and Christ never sinned.
Don't 'bet' on it: count on it.
From out of the heart the mouth speaks.
Finding foul language offensive is not an impediment, but a blessing.
Colin
PS. Health warning: Coke is a molluscicide
CamFraser
07-20-2005, 10:09 PM
Colin, PM or email Chipmunk (http://www.aota.net/forums/member.php?userid=2304), who's written just such a tool. It produces a report and exports IP addresses in htaccess "deny from" format. Chipmunk has posted elsewhere looking for volunteers willing to share and publish referer spam data.
Do some searching here for "SUCKS" and "SURBL" for info on what can happen if committed people share net abuse info. The acronym SUCKS is in the sense of a sucking chest wound, not in any prurient sense. You can count on it. :P
PS. Health warning: Coke is a molluscicide That almost sounds like a Wasser-ism! Best not mention Coke health issues around nerds. We needs our fuel. :P
CamFraser
07-20-2005, 10:23 PM
Colin, I just read some of your other postings, and felt I should disclose that Chipmunk is both an American citizen and a pre Iraq War military veteran, though by choice flew unarmed aircraft and has always been against the Iraq War. You may feel uncomfortable dealing with someone with the former two types of background.
That's not a criticism - I respect people who know who they are, and stand by their principles!
Thanks for your suggestion CamFraser.
If I understand correctly, the script you mention adds a 'deny from' to your htaccess file, which is what I was doing manually.
The deny from list was growing so fast it was seriously slowing loading of my site, so I removed them all.
What we need now is a script that edits our log files, removing log spammer's entries.
(ie. we allow them to access the site but we don't have to see their junk in our logs.)
I have every confidence that Terra et al will fix this properly in due course.
I appreciate your comments about my previous postings, but I assure you I'm not anti-American.
In fact I collaborate with many Americans in real-life, and on the web. We list the names of all the Veterans Call to Conscience members who petitioned against the Iraq invasion. (And are calling for immediate withdrawal.)
Bush, Blair, and their supporters, are clearly war criminals by their own definition.
What I object strongly to, are war criminals of any nationality posing as Christians and bringing disgrace to the Name above all names.
He said, 'Love your enemy...
All who follow Him do so, and see that it never fails to bring eternal victory.
He will have the Final Word. (And soon)
Colin
CamFraser
07-22-2005, 10:58 PM
The tool is not a script. It's a standalone Windows GUI app that analyzes raw logs, lets you tag legit referers, run WHOIS queries if you're in doubt (referal spammers churn their domain names, thus many are recently registered), and generate a report that you can put on your site for Google to crawl and list. It also tracks IPs and lets you export them in htaccess format. I hear the next version will probably give you a list of IPs that haven't been active recently, so you can cull them from your htaccess list.
Its main goal is to share data in order to expose spammers.
As an example, do you use SpamAssassin? Did you notice how its performance about doubled (or tripled) after FQ upgraded to the version that uses collaborative lists of spammer domain names (http://aota.net/forums/showthread.php?threadid=17768)? That's what happens when enough small groups of concerned citizens unite!
Or in other words, it's a way of shining light on the misdeeds of spammers: Everyone who does evil hates the light, and will not come into the light for fear that his deeds will be exposed. But whoever lives by the truth comes into the light, so that it may be seen plainly that what he has done has been done through God.
The purpose of web access logs is to record everything, so I'd be alarmed if FQ removed anything beyond (the bad entries that they already do (http://aota.net/forums/showthread.php?threadid=17137). There's just too much risk of errors. Are you only using the preinstalled online stats? The feature you want (hiding certain referers) should be available in any good offline web access log analysis software. Ask around here for recommendations. There's several free to low cost apps.
I agree completely with you about Bush, Blair, et al. I like what the new Pope said more than two years ago (in an inteview with ZENIT News Services italics added by me): There were not sufficient reasons to unleash a war against Iraq. To say nothing of the fact that, given the new weapons that make possible destructions that go beyond the combatant groups, today we should be asking ourselves if it is still licit to admit the very existence of a "just war."
As a veteran myself, I find it disturbing that the loudest proponents of this war either chose never to serve in the line of fire themselves (they had "more important" things to do), or served briefly during a period when they were never deployed. They choose to send others into harm's way, when they chose never to do so themselves. They're the vilest form of coward. They're the only ones I call "anti-American".
Stecyk
08-15-2005, 08:00 PM
Hi,
Much earlier in this thread, I saw the following comment:
Plea to Responsible FQ Citizens: Double check that your logs are password protected, and so NOT publically viewable.
Which logs do I password protect, and how do I password protect?
I see
logs
logs-cgi
logs-email
logs-web
Do they all get password protected? And if so, do I simply use the file manager in the CNC to password protect?
Best regards,
Kevin
Terra
08-15-2005, 08:17 PM
Those (log*) directories are out of your DocumentRoot, therefore not accessible externally via the web...
The directory you should protect is your 'www/stats' directory, as it can provide a blueprint to your site...
--
Terra
--cue Mission Impossible music--
FutureQuest
Stecyk
08-15-2005, 10:58 PM
The directory you should protect is your 'www/stats' directory, as it can provide a blueprint to your site...
Do I just use CNC and the file manager to password protect? Does password protecting the stats directory negatively affect anything else?
I am sure that these are overly simplistic questions, but I am just learning the ropes.
Best regards,
Kevin
Kevin,
Yes the easiest way would be to use the CNC File Manager. In fact this Knowledgebase Article should help...
http://Service.FutureQuest.net/index.php?_a=knowledgebase&_j=questiondetails&_i=186
I would say that the majority have their stats password protected and it is highly recommended to do so. I know of nothing that password protecting your stats would adversely affect except some black hats seeing you are running a potentially exploitable script...
-Bob
Stecyk
08-15-2005, 11:32 PM
Hi Bob,
Thank you for responding.
From the link you provided:
If your package was set up after July 27, 2003 your stats were password protected by default with your account username and password upon account activation.
I joined FQ in late Spring/early summer. Thus, I am covered. I checked my .htaccess file is present with some password stuff there. So it looks as though FQ has already saved me from the nefarious troublemakers.
I am glad I investigated though, because I am now confident that issue has been dealt with. :smile:
Best regards,
Kevin
If anyone reading this, or Kevin, wishes to make sure their stats are password protected, the easiest way is to access your domain stats and see if you are prompted for a username/password combo...
http://example.com/stats/
(Replacing example.com with your actual domain name and appropriate extension)
-Bob
vBulletin® v3.6.8, Copyright ©2000-2009, Jelsoft Enterprises Ltd.