|
|
|
06-10-2002, 12:00 PM
|
Postid: 69305
|
|
Site Owner
Join Date: Mar 2002
Location: Scottsdale, AZ
Posts: 177
|
EFM and HTML
This has probably been answered before, so I apologize for asking...
Does EFM disregard everything contained within HTML tags in a message..? The only spam which seems to be avoiding my filters is that HTML formatted crap. EFM is not catching the "banned words."
If so, is there a workaround..?
Thanks..!
|
|
|
06-10-2002, 01:19 PM
|
Postid: 69308
|
|
CTO FutureQuest, Inc.
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 7,678
|
I've seen spam senders getting clever in this area by using inlined comments that will much up filters...
Code:
Win 10 mi<!-- blah -->llion doll<!-- gibberish -->ars Now!!!
The filter is looking for 'million' and not the text that is broken up by the inline comments - however your email client should render it as 'million' with the comment gone (as it should by HTML rules)...
Looks like the spammers have just raised the stakes once again requiring more complexity to combat...
--
Terra
--Spam Assassin--
FutureQuest
|
|
|
06-10-2002, 02:01 PM
|
Postid: 69312
|
|
Site Owner
Join Date: Feb 2002
Location: Denver, Colorado
Posts: 865
|
Quote:
Originally posted by Terra:
I've seen spam senders getting clever in this area by using inlined comments that will much up filters...
The filter is looking for 'million' and not the text that is broken up by the inline comments - however your email client should render it as 'million' with the comment gone (as it should by HTML rules)...
Looks like the spammers have just raised the stakes once again requiring more complexity to combat...
|
Ugh!! I really don't get why spammers work so hard to defeat people's filters. Isn't it obvious that if someone works really hard to filter out spam, they're not going to be a receptive audience to those few messages that get through? I mean, these guys must have a death wish. I will not shed any tears if some of these spammers are tracked down and made to "pay" for their scornful abuse of other people's time, money, and privacy.
In the meantime, the following enhancement to EFM would greatly help to counter this latest spammer trick:
Many of us have friends, customers, etc. who choose to send us HTML-formatted email. These people we can place in our EFM White Lists to insure receipt of their email. For people we don't know, however, it would be nice to filter out HTML messages. So it would help to have a checkbox in the Black Lists section that says "Ban HTML email".
How 'bout it Tom & Sheila? Would this be fairly easy to add?
Thanks from your humble open source customer,
__________________
Scott
|
|
|
06-10-2002, 02:34 PM
|
Postid: 69316
|
|
CTO FutureQuest, Inc.
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 7,678
|
Quote:
|
humble open source customer
|
Do you yourself have the capability to patch the code and submit the changes to Tom and Sheila?
I figure both of them are extremely swamped, and the beauty of Open Source software is that you don't have to wait on others when you can do it yourself...
--
Terra
--The Bazaar--
FutureQuest
|
|
|
06-10-2002, 03:06 PM
|
Postid: 69318
|
|
Site Owner
Join Date: Feb 2002
Location: Denver, Colorado
Posts: 865
|
Quote:
Originally posted by Terra:
Do you yourself have the capability to patch the code and submit the changes to Tom and Sheila?
I figure both of them are extremely swamped, and the beauty of Open Source software is that you don't have to wait on others when you can do it yourself...
|
I could take a look at the code. But I haven't done any PHP or Python, so I'd definitely be hacking my way through it.
__________________
Scott
|
|
|
06-10-2002, 03:34 PM
|
Postid: 69319
|
|
Site Owner
Join Date: Mar 2002
Location: Scottsdale, AZ
Posts: 177
|
Quote:
Originally posted by Terra:
I've seen spam senders getting clever in this area by using inlined comments that will much up filters...
Code:
Win 10 mi<!-- blah -->llion doll<!-- gibberish -->ars Now!!!
|
No, it's not complex coding at all. Something like:
<HTML>
<H3>Win 10 Million Dollars</H3>
</HTML>
will pass the filter looking for million.
|
|
|
06-10-2002, 04:28 PM
|
Postid: 69320
|
|
Site Owner
Join Date: Aug 1999
Location: Metro Los Angeles Area
Posts: 7,398
|
LJ,
The EFM filter does strip out HTML tags before searching for the match words that you are filtering on. However, it is possible that in writing our HTML tag-stripping routine, we didn't think of something, or that the spammers have thought of something new. Please send copies of complete messages, along with a copy of your filters, that are not begin caught by the filter as you expect, along with a brief comment or description. If you could zip it up and attach it to an email and send it along, I will look at it when I get the chance.
In theory, our filter should even be catching the example that Terra is showing.
I have, however, seen that lots of Java script in the message will sometimes defeat our tag stripping routines, and have not had the chance to really work on that.
|
|
|
06-10-2002, 05:37 PM
|
Postid: 69321
|
|
Site Owner
Join Date: Mar 2002
Location: Scottsdale, AZ
Posts: 177
|
Quote:
Originally posted by sheila:
LJ,
The EFM filter does strip out HTML tags before searching for the match words that you are filtering on. However, it is possible that in writing our HTML tag-stripping routine, we didn't think of something, or that the spammers have thought of something new. Please send copies of complete messages, along with a copy of your filters, that are not begin caught by the filter as you expect, along with a brief comment or description.
|
Sorry, but I've reported them through SpamCop and tossed them. Next time one gets through, I'll try to do what you suggest.
|
|
|
06-10-2002, 09:34 PM
|
Postid: 69326
|
|
Site Owner
Join Date: Aug 1999
Location: Metro Los Angeles Area
Posts: 7,398
|
Quote:
|
In theory, our filter should even be catching the example that Terra is showing.
|
Upon further reflection, I must recant that statement. Our algorithm strips out the tags, replacing them with a single space, and then reducing multiple consecutive spaces down to a single space. So it would not catch something as in Terra's example.
Any time you have stuff that seems to be working "wrong" and you would like us to look into it, we need to be able to reproduce the error in order to figure it out and fix it. The easiest way to do this, is to have the data that is known to cause the error. So, please do send error-causing emails and corresponding filters when possible.
And I'd like to second the other thing Terra said: If you are able to send patches, they would be most appreciated.
--swamped. sounds like an understatement. more like drowning. 
|
|
|
11-05-2003, 10:25 AM
|
Postid: 99662
|
|
Registered User
Join Date: Aug 2003
Posts: 62
|
Sorry to resurrect this thread from the past, but it's the closest one I've found to answering a question I have on EFM's spam detection.
I have the phrase "would you like to be financially independent" in my EFM's (vers. 1.11 beta) Banned Message Words filter. Problem is, the spammers have taken to embedding random HTML comment tags into the message thusly:
"Wo<!Mf>uld yo<!Xt>u li<!ygR>ke t<!GIX>o b<!jTW>e Financ<!TX>ially Indep<!MgCq>endent"
EFM doesn't find a match, but when displayed in an HTML-capable email client, you see the message just fine, since the tags are ignored.
According to the last entry in this thread from Sheila, HTML tags are replaced with a space. The EFM PHP script has a "strippedstring(inputstring)" function defined. Line 693 is:
returnstring += ' ' + inputstring[:openindex]
I'm no expert in Python, but what would happen if the space in the concatenation is eliminated? Wouldn't that just delete the HTML tag completely?
(I thought I'd ask first before I screw up my whole email system by making the change.)
|
|
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 visitors)
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -4. The time now is 11:46 PM.
|
| |
|
|
|