Hi guys,
I can definitely double check to ensure housekeeping are on top of the training however whilst I'm here it may be useful if I provide further details regarding how the spam filter is trained...
There is an automatic training system which runs every night on all of the mxcore servers. This system relies on a cron job on the mailops server.
This script has only one purpose, it moves emails from the imap folders for spam and notspam, under the account despamchecker, into a network share held on the NAS.
This share has two folders, clean and spam. A script, which is held on each of the mxcore servers, picks up the emails held in these folders, and passes them through dspamc with the options specific to the folders definition. For instance, emails held in the clean folder are treated as innocent emails.
It forks into two processes, one which trains spam emails, while the other trains clean. The script is run on the servers in a staggered way, each server launching the process 15 minutes after the server numerically number one less that its self; i.e. sunmxcore01 starts at 01:00 while sunmxcore02 starts at 01:15.
The IMAP folders used by the automatic training system are populated by our housekeeping team.
This person will...
- Setup an IMAP client to access the despamchecker mailboxes.
- In the despamchecker+spam mailbox check the headers off all mails, and a sample of the actual mails.
- Once happy that the mails in the despamchecker+spam account are spams move a maximum of 400 over to the Spam IMAP folder under the despamchecker account.
- In the despamchecker+notspam mailbox check the headers off all mails, and a sample of the actual mails.
- Once happy that the mails in the despamchecker+notspam account are not spams move a maximum of 400 over to the notspam IMAP folder under the despamchecker account.
- Now check that the Spam and notspam folders under the despamchecker account have the same number of emails pending.
- Clean out the spam+despamchecker and notspam+despamchecker accounts.
- Any emails both in SPAM and NOTSPAM folders which are above 10K can be safely deleted. DSPAM uses text to train itself and attachments or large emails are no use for training.
.We've also been using the odd honeypot address here and there following the recent spam problems as mentioned
here.