| ACADEMIC COMPUTING and COMMUNICATIONS CENTER | |||||||||
How Mailtools Filters Work | ||||
|
||||
| In General | ||||
|
When mail arrives on the server that is destined for your account, it is handed off to a program called procmail. Before procmail delivers a message into your Inbox, it looks to see if you have a file in your home directory called .procmailrc, and, if you do, it looks inside for instructions on how to deliver the message. The Mailtools Web interface for creating email filters translates the criteria and action you specify into procmail "rules" and places the rules in your .procmailrc file. Sounds simple, and it is, but the problem is that the procmail language is incredibly complicated. Consider, for example, Ima Historian's filter for mail from the roman_history list that contain "latin verbs" in the subject. Here is the procmail rule that Mailtools created for Ima: :0 : * ^FROM:.*roman_history\@example\.com * ^Subject:.*LATIN\ VERBS mail/verbage Not only is the text cryptic, but each colon and slash and asterisk and caret mean something specific, and the placement of the commands on the lines is also significant. Creating a set of functional procmail filters by hand is not for the faint of heart. You can read more about procmail by logging into your tigger or icarus account and looking at the man pages for procmailex (examples), procmailrc (about the procmailrc file), and procmail, in that order. (Enter: man procmailex and so on.) If you do, you will see that the Mailtools filters use only a small fraction of the services that procmail provides. If you want to venture further into procmail, a good way to start is to create some filters with the Web interface and then edit the .procmailrc file it creates for you as you desire. Note, however, that the Web interface tools will not work anymore on any filters you change manually. If you already have a .procmailrc file, and you will know if you do, you can use the Mailtools interface to create additional filters; the Mailtools filters will be added to the bottom of your existing .procmailrc file and therefore will be applied last. If you want them to be applied somewhere else, you'll have to move them by hand. But be sure not to change the text of the Mailtools filters if you do. |
||||
| The Antispam Filter in Particular | ||||
|
The Mailtools antispam rule set is rather long and complicated, and it may be changed as circumstances change. So, instead of placing the entire rule into your .procmailrc file, the Mailtools utility places a line into your .procmailrc file that tells procmail to include the global antispam filter in your rules at that point. The global antispam filter is located in a global directory with some other ready-made filters. The procmail rules finally chosen to compose the antispam filter were selected from a large set of possible methods for determining whether a piece of incoming mail is indeed spam. The rules now in use for this filter were chosen after careful research was done on the efficacy and efficiency of each. Thus, they may also change in the future: new spamming techniques may render certain methods more or less applicable, or new hardware may allow us to use less efficient filters. The first and most effective method employed in the Mailtools antispam filter is simply to take advantage of the laziness of most spammers. Currently, eighty to ninety percent of spam mail is sent without a valid To: or a valid Cc: header. That is, these fields, if they are present at all, do not contain your email address. They simply use the same set of headers for every piece of mail sent. Mail sent from colleagues or friends, however, will almost never look like this. (Unless it's "bounced" to you or sent as a blind carbon copy, Bcc:. That's another reason why you shouldn't immediately delete all spam and why you should check your spam mailbox on a regular basis.) Email discussion lists like Ima's roman_history list are an exception to this rule: they usually distribute mail without your address in the To: or Cc: field. Thus the only way to determine whether any piece mail is valid is for you to specify the lists to which you are subscribed. So, the first set of rules in the antispam filter mark any mail as spam that is a) not addressed to you or b) not from a valid list or address as defined by you. Using these criteria will catch a large fraction of the spam you receive. Unfortunately, the efficacy for identifying the rest of your spam decreases dramatically at this point. Of the ten percent or so of spam that makes it through the first step undetected, you might expect an additional ten percent or so of the remaining spam (i.e., two percent) to be caught by the next set of rules, which we call "headercheck" rules. The headercheck rules check through the headers of each email message for common signs that the mail is spam. For example, if the mail contains invalid header tags, empty or missing To: fields, empty or missing From: fields, missing or invalid message ids, invalid From: settings, invalid IP addresses, header forgeries, and so on. As of now, these two are the only rule sets in the Mailtools antispam filters. There are many additional methods for identifying spam in use elsewhere, which we tested for inclusion in our antispam filters. Our tests showed them all to be either ineffective, too dynamic (requiring constant maintenance or updates), or too inefficient (requiring too much CPU time for the amount of mail we receive). For example, there are organizations that try to keep track of spammers and the hosts from which they send their spam, and they make these lists publicly available. Such organizations include the Realtime Blackhole List, Spamhaus, and so on. The theory is that you check the hostname of the machine that each new message originally came from against a list of known spamming hosts. If a host is on a list of spammers, then any message from it is spam. Our tests, however, showed that not even one percent of spam messages were identified by this kind of rule. And these lists sometimes include, accidentally or otherwise, hosts from which there are innocent senders. Additionally, this rule would also require constantly connecting to these other sites to check the hostnames, or constantly updating and maintaining local lists received from these sites. In summary, the two sets of filters mentioned above comprise the antispam methods we've decided to offer via the Web interface. They are a first attempt at mixing accuracy, effectiveness, and simplicity into taking a palatable bite out of spam.
|
||||
| The A3C Connection, April/May/June 2001 | Previous: Canned Spam Filters | Next: SSH: Do You Know Where Your Password Is? |
| 2001-8-10 connect@uic.edu |
|