FutureQuest, Inc. FutureQuest, Inc. FutureQuest, Inc.

FutureQuest, Inc.
Go Back   FutureQuest Community > General Site Owner Support (All may read/respond) > Email & Mailing List Management
User Name
Password  Lost PW

 
Thread Tools Display Modes
Old 11-05-2006, 01:10 PM   Postid: 153239
jr_citizen
Registered User

Forum Notability:
0 pts:
[Post Feedback]
 
Join Date: May 2005
Location: London, United Kingdom
Posts: 5
Blocking image spam

I've found some great tips in the forums about various techniques to reduce spam. I've been copying my email to a secondary mailbox where I can play with various filters and settings, and analyze the email I'm receiving.

I've noticed that one one account I'm receiving about 50 messages a week of image spam. SpamAssassin can't flag these as spam; some of my other accounts receive more/less of these messages. Of these, they are all GIF images, so I looked at the number of emails I'm receiving with GIF images embedded/as attachments, and found that 20% are valid emails. As such, I can't really reject all GIF images, but want to do some advanced processing on them.

Here's what I'd like to do:
- Check all incoming messages for attached GIF files
- If it finds a GIF image, run it through GOCR
- Compare the output of GOCR against a watchfile
- If the script finds a match, then redirect to another mailbox, otherwise accept the message

I've found a number of components in the email recipes etc to help me, but I have no experience writing *nix scripts and could really use some help here. OTOH, I have managed to compile Gocr and test it with a number of image spam messages I've received, and found a good measure of success with Gocr. I'd love to use Tesseract, but couldn't get it to compile.

Can anyone help on the script above?
jr_citizen is offline  
Old 11-08-2006, 08:32 PM   Postid: 153324
CamFraser
Registered User

Forum Notability:
102 pts: Helpful Contributor
[Post Feedback]
 
Join Date: May 2004
Location: NYC, and thirsting to return to the Midwest
Posts: 114
Re: Blocking image spam

Sounds like you're a tinkerer! Are you locked into using OCR, or would a different approach (that works) be ok?

This filter is FQ-compatible (was written by a former site owner), and has a nifty image test among its anti-spam arsenal.

Last month, I had over two thousand spams with images, and none got thru. I did have one false positive due to an image test misfiring, but it was a no priority personal email, and it was easily spotted by the offline analysis tool (a mighty powerful Windows GUI app, though its UI is still a little bit clunky). I've now got it configured to skip image tests for the IP of that specific sender.

It also lets you completely block or just moderately score any nations, do a whole bunch of simple header tests, do IP and domain based blocklist tests, and has some very powerful word tests that have pretty much eliminated all my text based stock spams (zero got thru last month).

The thing I like most is that it saves a complete copy of every message (spam and ham), and the offline analyzer has two sets of tools for spotting any FPs, as well as a tool for reinjecting them directly into your mail queue.

For years I'd been reluctant to do server side killing, due to the risk of FPs and the difficulty of detecting them. I was talked into running this filter in "stats" only mode as a favor, since I'd helped on a previous anti-spam project and at the time I'd just gotten a dedicated server for my wife's site. After a week of using it, I switched it to kill.

I've been using it for a year now, and my killrate has been above 99.5% for more than eight months (last 6 were above 99.9% ). Lowest killrate since I went live has been above 98%. My stress levels are way down.

The author is pretty obsessive about spam, so there's regular updates. I've been promised that the next version will include the "Jayne Cobb commemorative Grenades test" (a tribute to the superb science fiction TV series Firefly).
__________________
"No one could make a greater mistake than he who did nothing because he could do only a little."
- Edmund Burke
CamFraser is offline  
Old 11-09-2006, 11:36 PM   Postid: 153346
Randall
Fuzzier than thou
 
Randall's Avatar

Forum Notability:
1187 pts: A True Crowd-pleaser!
[Post Feedback]
 
Join Date: Nov 2002
Posts: 9,636
Re: Blocking image spam

I can second that suggestion. The image tests work pretty darn well despite having no idea what the text content is. And considering that image-based validation scripts rely on the limits of OCR to block spambots, I doubt that an OCR solution is going to hold up over the long term.
Quote:
some very powerful word tests that have pretty much eliminated all my text based stock spams (zero got thru last month).
The stock scam tests involve frequent updates to the list of known stock symbols, so if you're the sort of person who procrastinates about installing updates (like me ), the killrate tends to drop a few points. Not to mention that I'm usually a version behind on the software itself.

But it still beats the stuffing out of SpamAssassin.

Randall
__________________
Where's Randall? (temporarily out of order)
Randall is offline  
Old 11-10-2006, 10:55 AM   Postid: 153356
jr_citizen
Registered User

Forum Notability:
0 pts:
[Post Feedback]
 
Join Date: May 2005
Location: London, United Kingdom
Posts: 5
Re: Blocking image spam

I'm open to the alternative idea suggested and have been in touch with the team working on the script. Hopefully I can join the test group

There were some other threads about the use of OCR -- some in favour, some against. Either way I agree it can be a mixed bag, and like anything, settings need to be tweaked to match the kind of spam each account is receiving.

With the types of messages coming in that I want to scan, it seems pretty straight-forward -- either the images are valid images, such as cartoons or signatures, or the OCR conversion gave me a good list of catch-words, such as "stocks, trades, Viagra" etc. I liked a plug-in available for SA that does what I'm looking for (called FuzzyOCR), but its not enabled on the FQ servers

I know that no matter what measure is employed, some messages will be tagged incorrectly as spam -- thats inevitable. I'm open to any and all ideas though!
jr_citizen is offline  
Old 11-10-2006, 12:49 PM   Postid: 153364
kitchin
Site Owner
 
kitchin's Avatar

Forum Notability:
1115 pts: A True Crowd-pleaser!
[Post Feedback]
 
Join Date: Jan 2001
Location: Virginia
Posts: 2,883
Re: Blocking image spam

How about just rejecting image attachments? Give your trusted coorespondents another email address to send to.
kitchin is offline  
Old 11-11-2006, 12:26 AM   Postid: 153377
CamFraser
Registered User

Forum Notability:
102 pts: Helpful Contributor
[Post Feedback]
 
Join Date: May 2004
Location: NYC, and thirsting to return to the Midwest
Posts: 114
Re: Blocking image spam

Quote:
Originally Posted by kitchin
How about just rejecting image attachments?
That was the first approach used, but IpNation uses scores instead of yes/no, so the score was tailored to each user's comfort level. Last summer, the number sneaking thru got high enough that Chipmunk took a close look at a few hundred ham and spam GIFs, and came up with the approach (as Randall outlines) of extracting header properties and letting people write rules based on those.

For example, one of my rules flat out rejects any GIF with an area smaller than 200 pixels (that catches the "ransom note" spams). Another rule adds 50 points (it takes 100 to reject) if there's any GIF with more than 2 "frames" of animation, unless the GIF's area is below a certain size. Yet another rule says that if the area is within a certain range, and the pixel density (i.e. "area divided by number of bytes in the data part of the largest single frame") is at least 7.0, score it by multiplying the density by 6.0, but cap that score at 120 points.

Those are a lot easier to create than explain, plus Chipmunk does most of the fiddly stuff.

It's the "pixel density" property that's the most relevant in telling the diff between good and bad. Take a look at a bunch of typical spam GIFs, then compare them to good GIFs. The stock spams almost always have a density at or above the high teens. Most good GIFs are less than half that.

The only stuff that gets thru (but is usually caught by other tests), is JPEGs. The good news is those are 2 to 4 times the size of GIFs, so they're still rare, plus Chipmunk thinks there are other image property based ways of detecting them, and just needs more data to analyze.

Even without that test, IpNation's nation blocking tests kill between 75% and 85% of all my spam. Belts and suspenders.

Quote:
Originally Posted by Randall
The stock scam tests involve frequent updates to the list of known stock symbols, so if you're the sort of person who procrastinates about installing updates
True, but Chipmunk credits your habits (and idea tossing with you) as the inspiration for creating a Watcher app that'll automate much of that for everybody. Thanks.

Quote:
Originally Posted by Randall
But it still beats the stuffing out of SpamAssassin.
Agreed! But that did cause me a mental double take while looking at your stuffed animal avatar.


Hey, kitchin, Chipmunk has many times mentioned that it would be cool if you, and/or anybody else even close to your caliber, joined the team.


Gratuitous Firefly quote:
River Tam: "Also, I can kill [spam] with my brain."
__________________
"No one could make a greater mistake than he who did nothing because he could do only a little."
- Edmund Burke
CamFraser is offline  
Old 11-11-2006, 09:03 AM   Postid: 153385
Randall
Fuzzier than thou
 
Randall's Avatar

Forum Notability:
1187 pts: A True Crowd-pleaser!
[Post Feedback]
 
Join Date: Nov 2002
Posts: 9,636
Re: Blocking image spam

Quote:
True, but Chipmunk credits your habits (and idea tossing with you) as the inspiration for creating a Watcher app that'll automate much of that for everybody. Thanks.
You're welcome. Figure I won't be the only lazy person using the program in the long run, so someone has to do the dirty work.
Quote:
But that did cause me a mental double take while looking at your stuffed animal avatar.
Ahem. We prefer the term "fuzzy" ... stuffing is for teddy bears who can do nothing but sit around all day looking cute.

Fuzzy animal puppets can choose to sit around doing nothing all day (and looking cute as heck doing it, thank you), which makes us a higher lifeform.
Quote:
I liked a plug-in available for SA that does what I'm looking for (called FuzzyOCR), but its not enabled on the FQ servers
That's the part that really hurts.

Randall
__________________
Where's Randall? (temporarily out of order)
Randall is offline  
Old 11-11-2006, 11:46 AM   Postid: 153394
jr_citizen
Registered User

Forum Notability:
0 pts:
[Post Feedback]
 
Join Date: May 2005
Location: London, United Kingdom
Posts: 5
Re: Blocking image spam

Quote:
Originally Posted by kitchin
How about just rejecting image attachments? Give your trusted coorespondents another email address to send to.
This is one idea I've heard passed around a lot; another one in a similar vein is to send people an email that basically "authenticates" them to be on your whitelist; otherwise everyone is blacklisted. I can't recall the exact name of that methodology.

Personally I'm opposed to these methods, and for a business, it doesn't look good at all. For example, you give out your business cards with your email address -- if you gave that to a potential client, you don't want them to have to jump through hoops to be able to email you. Similarly, my friends and family often give out my email address to other friends and family, and I don't want the hassle -- either on them or myself, or having to update a whitelist each time. I appreciate the concept behind these ideas, but I don't think they a particularly good solutions.

I'm just starting to learn about Chipmunks method; the basic idea is to analyze whats coming in and shape the rules around that. Thats definitely going in the right direction. No two accounts will receive the same types of spam, and you have to keep in mind that blocking spam is a defensive measure, so your defenses have to be shaped around the attacks you're receiving. For one of my accounts, I can easily block email from all but two countries, whereas on another, its just as common to receive a valid email from China as it is from California. On another set of accounts, its not out of the ordinary to receive valid emails about company filings (which could include stock information), but they would never want information about Viagra. Well, not yet at least.

Going back to my original request (help blocking image spam), that was based on the one type of email I hadn't been able to block out. However, I already had in mind that the solution of using OCR would only last so long -- at some point, spammers would find a way around that as well. I think Chipmunks method will probably yield a longer-term solution.
jr_citizen is offline  
Old 11-15-2006, 02:32 PM   Postid: 153599
kitchin
Site Owner
 
kitchin's Avatar

Forum Notability:
1115 pts: A True Crowd-pleaser!
[Post Feedback]
 
Join Date: Jan 2001
Location: Virginia
Posts: 2,883
Re: Blocking image spam

time warp... the new idea is at the bottom... it's fairly easy to implement...

I'm considering suggesting to some of my clients that we block all email with images. But that would require bouncing the sender a message saying to contact the client for instructions on how to send an image. For that, they would need a private address or some magic words in the subject. Problem is, bounces may just encourage the jerks. And I can't see doing it without bounces, it's just too false-positive.

Another problem is what type of attachments to block. Judging by your typical non-techie, I notice they sure like to send me simple text in either HTML or MS Word attachments.

I could tag the subject line with the type of attachments. That might be enough to ID spam without having to open it. Maybe enhance the tags with some of Chipmunk & CamFraser's heuristics, if they seem durable.
kitchin is offline  
Old 11-15-2006, 08:13 PM   Postid: 153611
CamFraser
Registered User

Forum Notability:
102 pts: Helpful Contributor
[Post Feedback]
 
Join Date: May 2004
Location: NYC, and thirsting to return to the Midwest
Posts: 114
Re: Blocking image spam

Quote:
Originally Posted by kitchin
I could tag the subject line with the type of attachments. That might be enough to ID spam without having to open it. Maybe enhance the tags with some of Chipmunk & CamFraser's heuristics, if they seem durable.
Why don't you contact Chipmunk and bounce that around? By the way, very soon there will be a private (troll-free) online forum to support IpNation, and to bounce around ideas.

Quote:
Originally Posted by Randall
Fuzzy animal puppets can choose to sit around doing nothing all day (and looking cute as heck doing it, thank you), which makes us a higher lifeform.
I near to choked laughing at that!

Thanks for replacing your fuzzy puppet in the pot avatr - #1 daughter found that a bit perturbing. She much prefers the new one.

Quote:
Originally Posted by Randall
That's the part that really hurts.
I have several online friends who do/did run FuzzyOCR, and the performance hit is so high, most have had to turn it off.
So in terms of running it on FQ, Firefly's M.Book said it best: "not gonna happen".
(have to fill my Firefly quote quota, and that fits perfectly)

Quote:
Originally Posted by jr_citizen
Personally I'm opposed to these methods, and for a business, it doesn't look good at all. For example, you give out your business cards with your email address -- if you gave that to a potential client, you don't want them to have to jump through hoops to be able to email you.
Very well put! Yes, if you solicit something, then the onus is on you to act accordingly.

Many anti-spammers hate "challenge-response" methods near as much as spam.

Quote:
Originally Posted by jr_citizen
I'm just starting to learn about Chipmunks method; the basic idea is to analyze whats coming in and shape the rules around that. Thats definitely going in the right direction. No two accounts will receive the same types of spam, and you have to keep in mind that blocking spam is a defensive measure, so your defenses have to be shaped around the attacks you're receiving.
Wow, you encapsulated that perfectly. Do you write docs? Feel free to offer to help out with the IpNation docs. Personally, I'd rather Chipmunk spend less time on that, more on coding.

I think you'll find it's very easy to adapt to the scenarios you describe. My wife is a professor, and potentially needs to receive email from almost every nation on the planet, which I thought was going to be a tricky problem. Chipmunk suggested I start with the standard ruleset with just a couple of custom "pass" rules (huge negative score if either .edu in the sender's email address, or my wife's name in the To header). Since everyone runs the filter in "stats" mode for a while before switching it to kill mode, that let me try things out risk free. I found that most non-USA email came from fixed IPs, so I now have all those in IpNation's IP "skip" list (that excludes them just from IP based tests).

It's been a few months since our last work FP. Almost all FPs are low priority emails from friends who do unsmart things like forward a URL that's on a spam blocklist.
__________________
"No one could make a greater mistake than he who did nothing because he could do only a little."
- Edmund Burke
CamFraser is offline  


Currently Active Users Viewing This Thread: 1 (0 members and 1 visitors)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -4. The time now is 07:40 PM.


Running on vBulletin®
Copyright © 2000 - 2010, Jelsoft Enterprises Ltd.
Hosted & Administrated by FutureQuest, Inc.
Images & content copyright © 1998-2010 FutureQuest, Inc.
FutureQuest, Inc.