Sunday, 27 May 2007

How to Configure SpamAssassin Bayesian Filter to Work with Exchange

Some organisations may have a Unix email gateway server and choose SpamAssassin to filter out the spam emails before the Internet email get delivered to the Exchange system. To achieve better spam filter result, you need to use the Bayesian filter and feed it with spam and ham. However, manually pulling spam and ham out of the Exchange mailboxes and import them to train the Bayesian filter can be a fairly time consuming process.

You can create two folders (one for spam, one for legitimate email) in the Public Folder and ask people to put spam and ham to the folders. Then, use a Perl script to pull all the spam and ham out to train the Bayesian filter each time you run this script. This way, everybody in your organisation can put the spam he/she received to the spam folder in the Public Folder.


Figure 1

Then you can put the Perl script (learn-spam.pl) to the SpamAssassin server and modify it a bit to work for you. (I found this script in a forum on the Internet.)


Please change the following accordingly.

$imapserver = "YOUR IMAP SERVER";

-uid="USERNAME"
-pwd="PASSWORD"

learn_mail ($HOME."/spam/", ".spam", "This is SPAM/", 1, "--spam --showdots")
learn_mail ($HOME."/ham/", ".ham", "Legitimate Email/", 1, "--ham --showdots");
----------------------------------------------------------
#!/usr/bin/perl -w
use strict;
use Mail::IMAPClient;
use Shell;
use Env qw(HOME);
use Getopt::Long;

use File::Temp qw/ tempfile tempdir /;

my $imapserver = "exchangeserver.allaboutexchange.net";

# set to 1 to enable imapclient debugging
my $debug = 0;

# set to 1 if running under cron (disables output)
my $cron = 0;

my $filename;
my $fh;

my %options =
(
uid => "username",
pwd => "password"
);

my $cmdsts = GetOptions ("uid=s" => \$options{uid}, "pwd=s" =>
\$options{pwd});

if (!$options {uid}) { die "[SPAMASSASSIN] uid not set
(-uid=username)\n"; }
if (!$options {pwd}) { die "[SPAMASSASSIN] pwd not set
(-pwd=password)\n"; }

my $uid = $options{uid};
my $pwd = $options{pwd};

# login to imap server
my $imap = Mail::IMAPClient->new (Server=>$imapserver, User=>$uid,
Password=>$pwd, Debug=>$debug)
or die "Can't connect to $uid\@$imapserver: $@ $\n";

if ($imap)
{
my $count;

# Deal with spam first
learn_mail ($HOME."/spam/", ".spam", "This is SPAM/", 1, "--spam --showdots");

# Now deal with ham
learn_mail ($HOME."/ham/", ".ham", "Legitimate Email/", 1, "--ham --showdots");

}
else
{
die "[SPAMASSASSIN] Unable to logon to IMAP mail account!
$options{uid}\n";
}

exit;

#
# read and learn mail from imap server
#
# arguments
# $dir directory to place retrieved messages in
# $ext file extension to use on retrieved messages
# $folder imap folder name on server
# $shared 0 if imap folder is in users mailbox
# 1 if imap folder is in shared name space or
# $sa_args additional arguments to specify to sa-learn
# (e.g. --spam or --ham)
#
sub learn_mail {
my $dir = shift (@_);
my $ext = shift (@_);
my $folder = shift (@_);
my $shared = shift (@_);
my $sa_args = shift (@_);

my $count = 0;

# tidy up directory before run
clear_directory ($dir, $ext);

# read mail from server
$count = read_mail ($dir, $ext, $folder, $shared);
if ($count > 0)
{
# learn about mail
sa_learn ($dir, $ext, $sa_args);

# tidy up files after sa-learn is called
clear_directory ($dir, $ext);
}
}


#
# reads mail from an imap folder and saves in a local directory
#
# arguments
# $dir directory to place retrieved messages in
# $ext file extension to use on retrieved messages
# $folder imap folder name on server
# $shared 0 if imap folder is in users mailbox
# 1 if imap folder is in shared name space or
sub read_mail {
my $dir = shift (@_);
my $ext = shift (@_);
my $folder = shift (@_);
my $shared = shift (@_);
my $count = 0;
my $target = "";

if ($shared)
{
# use a shared public folder instead
my ($prefix, $sep) = @{$imap->namespace->[2][0]}
or die "Can't get shared folder namespace or seperator: $@\n";

$target = $prefix.
($prefix =~ /\Q$sep\E$/ || $folder =~ /^\Q$sep/ ? "" : $sep).
$folder;
}
else { $target = $folder; }

$imap->select ($target) or die "Cannot select $target: $@\n";

# If a shared public folder is required uncomment the following
# lines and comment out the previous $imap->select line

# read through all messages
my @msgs = $imap->search("ALL");
foreach my $msg (@msgs)
{
($fh, $filename) = tempfile (SUFFIX => $ext, DIR => $dir);
$imap->message_to_file ($fh, $msg);
close $fh;
$count++;
}

if ($cron == 0) { print "Retrieved $count messages from $target\n"; }

return $count;
}

#
# Removes files in directory $dir with extension $ext
#
sub clear_directory{
my $dir = shift (@_);
my $ext = shift (@_);

opendir (DIR, $dir) or die "Couldn't open dir: $dir\n";
my @files = readdir (DIR);
close (DIR);

for (my $i = 0; $i <= $#files; $i++ ) { if ($files[$i] =~ /.*?$ext$/) { unlink ($dir.$files[$i]); } } } # # execute sa-learn command # sub sa_learn { my $dir = shift (@_); my $ext = shift (@_); my $type = shift (@_); my $learncmd = "/usr/bin/sa-learn ".$type." --dir ".$dir; if ($cron == 0) { $learncmd .= " --showdots"; } else { $learncmd .= " > /dev/null 2>&1"; }

#
# Run sa-learn script on spam directory
#
my $sh = Shell->new;
my @args = ($learncmd);

system (@args) == 0 or die "system @args failed: $?";
}

----------------------------------------------------------
I am very happy with the spam filtering results after implementing this.

Friday, 4 May 2007

Configure Mailbox Recipient Policy to Work with GFI MailArchiver

Once the GFI MailArchiver is up and running, all the new incoming and outgoing emails will be archived (if you have configured it to do so). What about the old emails prior the installation of GFI MailArchiver? The answer is to import all the old emails to a database or databases. The GFI MailArchiver for Exchange Manual details all the procedure from configuring the GFI MailArchiver Import Service to how to use the GFI PST-Exchange Email Export wizard etc.

However, the importation of old emails does not purge the emails in the current Exchange databases. It simply makes a copy of the email and put it into the GFI databases.

Once you have imported the old emails to the GFI MailArchiver databases, you need to create Mailbox manager Recipient polities to delete the old emails.

• In System manager, expand the Recipients node, and the select Recipient Policies.
• Right click Recipient Policies, point to New, and then click Recipient Policy.
• In the new Policy dialog box, select the Mailbox Manager Settings check box and then click on OK.
• In the Name field, type a name for the recipient policy.



• Click on Modify button, and you can select the recipient types that you want the new policy to apply to.
• Click on mailbox manager Settings (Policy) tag. You can specify the Age Limit to serve the purging old email purpose.
• Click OK to finish.

Remember, before you apply this mailbox Recipient Policy, make sure you get the permission from your boss and let all the users know what is going to happen, as they will need to log to the web page to access their old email, but not from their email client software.