Home | Services | Software | Resume | Contact

Spam Sort

A perl script to feed spamassassin bayesian filters using simple email forwarding


Spam Sort is a perl script designed to check two sets of email boxes: one flagged for spam, one flagged for ham (not spam).

New messages are checked via imap off of the server and parsed to appear as the original message (prior to forwarding) before being actually passed to sa-learn. Because of this, you can select a series of messages and forward all of them in a single message. I wrote spam sort for a company that wanted to give their users an easy way to 'train' their spam assassin filters. After setting up this script and configuring a mail account to receive the human flagged spam and ham (not spam) messages, their intranet users can now simply forward messages to is_spam@theircompany.com and not_spam@theircompany.com. The results have been fantastic: literally a 50% reduction in spam.

The typical usage of this script is to setup a special mail user (spam_sort for instance) and alias is_spam@yourdomain.com and not_spam@yourdomain.com to that mail account. Then, simply forward spam messages to is_spam@yourdomain.com and ham messages to not_spam@yourdomain.com, put this script into cron for nightly processing, and your bayes db will be automatically updated.



April 6th, 2004     Version 0.3 released. Not so many core functionality changes as just generally cleaning things up. The script now uses Getopt::Long to accept command line arguments instead of having to edit the file manually in previous versions. There is now propper documentation using the Pod::Usage module. Just type -help or -man to see the list of options or the man page for the script. Functionality wise I added a check to see if the destination spam/ham folders exist.. if they do not, I automatically create them before beginning processing.

March 29th, 2004     Version 0.2 released. A bunch of improvements, mostly so that the script can be used alongside a separate method of users with IMAP access to allow moving the original spam messages into the same folder hierachy as the forwarded messages. In particular, by placing a message directly in the is_spam or not_spam directories, the script will know to process the messages. If you intend to have users move their messages to these folders, you may want to set the 'process_all_filtered_msgs' variable to 1. Otherwise, user-moved messages marked as 'read' will be skipped over.

March 25th, 2004     Initial release, version 0.1. I know there are a lot of things that could be implemented differently (using command line arguments instead of configuration variables, better error checking for incomplete imap folder settings, tighter restrictions on the criteria for accepting messages, etc), but that can wait until I either get around to it, or you submit a patch to me :)

Marc Swanson <mswanson@mswanson.com>
M. Swanson Consulting | 121 Lee Hill Road | Lee, NH 03861 | 603-413-6833
mswanson@mswanson.com | www.mswanson.com