PDA

View Full Version : Can we get tighter server monitoring?


dank
06-23-2006, 01:03 PM
Judging from other threads, it seems I'm not alone in witnessing massive bandwidth usage increases due almost entirely to various forms of spam. My bandwidth has fully doubled in the past year, despite realistic site traffic actually being down.

I've been going through and banning IPs as the obvious ones come up, but I fear that's a losing battle, as I'll always be a step behind, even if I commit several hours a day to analyzing logs. Yesterday, I had a handful of obviously fake visitors account for 3-7 MB a piece. Sure, I can remove the worst offenders, but what about the other 100+ chewing up 0.5-1 MB each? No way can I get all of those...

I've got to think there's a way the Server Guardian or some other such thing can help crack down on this. When I compare stats on two different sites, I see the same IP popping up dozens of times in short succession. Is there any way a master black list can be compiled real time, sort of a SpamAssassin for site visitors? I know it's got the potential to backfire if dialed in too aggressively, but I'd be willing to bet there's a safe zone somewhere in the middle.

Right now, I'm faced with the decision of purchasing extra bandwidth or removing content to balance out usage with the ghost town. If bandwidth availability is the prescious commodity Deb claims it to be, I would hope this extreme waste is being taken seriously.

Dan

Terra
06-23-2006, 02:35 PM
We do monitor the servers and investigate all high loading situations, and many times that is how we find the abusers especially the referer spammers that pin the blogs...

On the flip side though, it is not our responsibility to dictate or censor who visits your site and who can't unless that visitor is harming the server (as a whole)... Our responsibility is to ensure we provide your site with the highest amount of availability as possible to any and all visitors... We also run nightly bandwidth reports, and we do notify clients where we see unusual high bandwidth spikes...

On the tech side, I have tried to attack the referrer spamming problem at the server level, but unfortunately the lists that were built was adding so much overhead that they had to be pulled since each request must be inspected against an ever growing list... In the end - this is an application problem and not a server problem... The blogs of the world need to unite and find a common solution to the spamming problem which is completely and utterly out of control... There was once a day that SMTP was the golden child of the internet, but spammers found a way to abuse it into oblivion - the same vicious cycle is happening to blogs...

We handle millions of requests a day (per server), and that would impose on us to classify what is valid traffic and what isn't... This really needs to be managed by the site owner, as only they can really decide what is garbage and what isn't... For our part, we offer you the tools and capability to block the garbage if you so choose to... I really cannot envision a way to do it safely at the server level that would be a universal solution for all...

--
Terra
sysAdmin
FutureQuest

dank
06-23-2006, 02:55 PM
I was referring to a different site than the blog issue... They're hitting everything: forums, contact forms, dynamic directories, and static index pages.

Couldn't there be some sort of trigger in the daily logs analysis that, say, any IP address which is logged 100+ times gets cross-checked against visits to other sites, and if it shows up for several (and isn't a known, valid spider), it's considered unwanted? It would mean a day delay in filtering out the garbage, but a lot of it looks to be repeat, so it would be better than nothing. And it wouldn't require real-time monitoring of every one of the millions of daily hits.

Dan

phppete
06-23-2006, 03:15 PM
Couldn't there be some sort of trigger in the daily logs analysis that, say, any IP address which is logged 100+ times gets cross-checked against visits to other sites, and if it shows up for several (and isn't a known, valid spider), it's considered unwanted? It would mean a day delay in filtering out the garbage, but a lot of it looks to be repeat, so it would be better than nothing. And it wouldn't require real-time monitoring of every one of the millions of daily hits.

Dan

Wouldn't the above include valid and much welcomed SE spiders?

On another note, its possible to block any spam bots or none humans by not showing the page content if a session has not been started and you only start a session if the user is using a browser (user agent detection). I do this on forms to prevent non humans even seeing the form to begin with. You could easily block a whole site from non humans with this method.

Terra
06-23-2006, 03:41 PM
I have looked at that scenario before and the problem with that are proxies, caches, and NATs, where there are thousands of visitors coming in from behind them... Blocking by IP is next to impossible on such a large scale, which means we have to look at the referrer field which unfortunately is computationally expensive...

You can however write scripts that can perform that task for your site, and if you see anything that is blatently obnoxious you can distribute those blocks to all of your sites in the form of .htaccess (deny from xxx.xxx.xxx.xxx)... I still have a problem with blocking access to other sites which is why I won't do this globally unless it is shown that the visitor is harming the server as a whole and even then I have to be extremely careful that I'm not blocking valid proxies, etc...

What you are asking for would be fine if we were only dealing with a handful of domains, but when we start dealing with tens of thousands of domains, it just doesn't scale very well in a universally accepted sense...

--
Terra
sysAdmin
FutureQuest

Matt
06-23-2006, 04:59 PM
On another note, its possible to block any spam bots or none humans by not showing the page content if a session has not been started and you only start a session if the user is using a browser (user agent detection). I do this on forms to prevent non humans even seeing the form to begin with. You could easily block a whole site from non humans with this method.

Interesting. I would assume that form content is also being blocked from search engines. Is that correct? This, of course, assumes that people are playing nice (i.e. not forging user agent strings).

Dan, you raise a good point, which is that large hosts are better positioned to be able to detect behavior patterns. I think Terra's points are also very good ones. How do you distinguish a search engine that is spidering sites from a nuisance user that exhibits similar behavior? By IP address? By refferer? User-agent? All of these must be vigilantly maintained, otherwise you get the other sector that's not getting enough traffic complaining. A solution that (I think) would meet both requirements would be an optional service that maintains records of suspect visitors. The way it works is like this: you get a visitor to your site; before displaying content, you pass IP, user-agent, and referrer along to the centralized server. The centralized server responds with a "SPAM" score. You can then decide how best to act. I imagine such a service would be optional (and billed as a separate service) as it would require its own infrastructure.

This may seem like overkill now, BUT, I suspect that with the increasingly sophisticated web sites out there (and the proportional risk of being the target of an attack), that hosting companies will ultimately have to provide some defense mechanism. The nice thing about a centralized server is that the customer can decide the best way to respond (which may be letting FQ apply some default action). Imagine this scenario: a hacker tries unsuccessfully to hack a web service I have built. After 5 unsuccessful attempts, I report the user's IP, user-agent, and referrer to FQ's security logging server and disable the visitor's site access. The hacker then attempts to hack a web service on Dan's site. He queries the security logging server which reports the user has already tried unsuccessfully to access another FQ site. The visitor is then blocked from accessing Dan's site (if Dan decides this is the best course of action).

Applying things like this isn't easy. Something more feasible along these lines that I would like to see is e-mail filtering that acts like this: if several accounts receive the same e-mail, the e-mail is deleted (or marked as spam). I think this would go a long way toward eliminated a LOT of spam in a pretty safe way (things like newsletter subscriptions could be whitelisted by users).

-Matt

Randall
06-23-2006, 06:15 PM
Something more feasible along these lines that I would like to see is e-mail filtering that acts like this: if several accounts receive the same e-mail, the e-mail is deleted (or marked as spam). Wasn't that the original concept behind Brightmail? Or maybe I'm thinking of another company. In any event, I have to think that the spammers are doing everything they can to randomize their junk, to the point where this kind of filtering would be pretty much useless by now. Imagine this scenario: a hacker tries unsuccessfully to hack a web service I have built. After 5 unsuccessful attempts, I report the user's IP, user-agent, and referrer to FQ's security logging server and disable the visitor's site access. The hacker then attempts to hack a web service on Dan's site. He queries the security logging server which reports the user has already tried unsuccessfully to access another FQ site. The visitor is then blocked from accessing Dan's site (if Dan decides this is the best course of action). Would this necessarily have to be provided by FQ? Sounds like something an enterprising individual could offer from anywhere, since essentially you're just storing information in a database and responding to queries, à la the email RBLs. The UI for reporting suspicious activity could be a PHP script running on the user's own site, and the lookup function embedded in the site's code.

The computationally expensive work, as Terra puts it, would be done by individual users of the service as opposed to the network as a whole. And if done intelligently, with local caching of previous queries, the bandwidth involved could be minimized.

Just thinking off the top of my head here -- could be any number of technical/business/legal reasons why it wouldn't work, or just be too expensive to do. :safegrin:

Randall

Matt
06-23-2006, 06:29 PM
Yes, Randall, it wouldn't have to be central to FQ. It shouldn't be too computationally intensive either since you simply supply PHP and ASP code to your customers and they actually handle the determination of whether to report suspicious activity. Responding to a query would be a database call away. The biggest drawback would be the risk of intentional poisoning of the database (what I was afraid might happen w/ some of FQ's security mechanisms). Not sure how you protect against this except for charging a lot for the service and implementing stiff penalties for abuse. -Matt

Back to Dan's original post, this is something that I think some of the major porn hosting companies have tackled. Dan, you could always move your site to one of those servers ;)

dank
06-23-2006, 08:28 PM
Oh great, now we're turning to the spammers for spam protection...

I'd be concerned with a non-local approach, as you're dependent on another server's response for each and every page load.

I was thinking sort of along the lines of where Matt was headed, in terms of it being something of an opt-in application. If FQ had a master list compiled and regularly updated, it could perhaps be an option in the CNC to "enable at your own risk," just like the various email tools. Would there then be a way to [relatively easily] enable/disable it on an account by account basis?

If the logistics could be worked out, it seems it would greatly be in FQ's interest to do so. There's a ton of wasted bandwidth going out the door right now. In the case of my site currently in question, it's typically been using 5-6 GB per month on a 10 GB package. That leaves a fairly large chunk of the allotment unused, freeing up the resources for others. Now, with less site traffic, that number is up to 11 GB and climbing, meaning there's a lot less to go around for others and nothing's been gained for it. What we've got right now is a losing situation for everyone.

Dan

phppete
06-24-2006, 02:22 AM
Interesting. I would assume that form content is also being blocked from search engines. Is that correct? This, of course, assumes that people are playing nice (i.e. not forging user agent strings).


Yes that is correct, but another way to prevent forging user agent string is to look for a valid xmlhttp request object when the user visits the site, upon successful object detection start a session, otherwise don't and do the same thing. Probably not ideal in many sitautions but if it blocks forms from spammers then it does have its uses. Obviously I'm not suggesting stealth as a way of securing forms, you would still use proper server side input validation of course.

dank
06-30-2006, 01:30 PM
So is that it, topic settled? No response to Matt's input?

I don't see why FQ has such a hard time understanding why I've grown increasingly unhappy here. Every time an issue of the sort comes up, it's been made clear that, despite what's said, FQ's interests are in protecting itself, not the customers. Even when the customers try to help out FQ overall, they're seemingly left out in the cold. All we get is lip service that our opinions matter greatly...

I've upgraded the package in question because my own attempts at curbing the bandwidth abuse could not bring it in check, but that is far from a solution. I generally think of FQ as being on the technical cutting edge, so if they haven't come up with something workable, I don't hold out a huge amount of hope than anyone else has (outside the porn industry), but it's something I'll have to look into. The current situation and attitude is not acceptable to me, and shouldn't be to anyone else concerned with waste. In one thread, Deb tells me that a few extra gigs here and there are of the make or break it variety, then I'm told in this thread that 5 GB per month of pure waste are not worth worrying about.

I guarantee you, the problem will only get worse.

Dan

Terra
06-30-2006, 01:42 PM
I am sorry you feel that way... I gave you the best responses I had to your request in:

http://www.aota.net/forums/showthread.php?postid=149309#post149309
and
http://www.aota.net/forums/showthread.php?postid=149313#post149313

What you are asking for, I cannot provide you... The onus is upon you to monitor your site and block the undesired traffic - we cannot make that decision for you...

On the flipside, I am a bit peeved with all the referrer spam, and at least there are genuine patterns there that I can lock on to and setup global blocks... The kind of traffic that you received that in turn jacked up your bandwidth usage is next to impossible to progmatically categorize and the closest solution would be like a SpamAssassin or Bayesian for Apache - and even then there are a lot of false positives meaning perfectly valid Apache traffic would be blocked unnecessarily... We also simply do not have the resources to park a human or two to monitor the traffic and try to decide what is good or bad before implementing the block... Even if we could, we still run afoul of proxies which cannot be blocked...

I'm sorry, but what you are asking for simply doesn't exist, nor do I see a way to design or implement the heuristics necessary to categorize the traffic on a global scale...

--
Terra
sysAdmin
FutureQuest