tag:blogger.com,1999:blog-14311740.post-87752699349950765202008-02-26T14:39:00.000-08:002008-02-26T15:00:12.703-08:00By Request - My dspam Training Script<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_h6vxr1Bf3cw/R8SYYqr0LiI/AAAAAAAAAfk/LtC3mny9Izw/s1600-h/dspam-logo-eyes.gif"><img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp0.blogger.com/_h6vxr1Bf3cw/R8SYYqr0LiI/AAAAAAAAAfk/LtC3mny9Izw/s320/dspam-logo-eyes.gif" alt="" id="BLOGGER_PHOTO_ID_5171425821670518306" border="0" /></a>In a <a href="http://c600g.blogspot.com/2007/02/dspam-installed.html">post</a> I made about one year ago, I mentioned a script which I created which trains <a href="http://dspam.nuclearelephant.com/">dspam</a> to recognize missed spam email, and corrects it when it falsely identifies a good ( or "ham") email as spam. Someone has requested that I post that script, so here it is. Please note that my <a href="http://www.qmail.org/top.html">qmail</a> installation uses the <a href="http://en.wikipedia.org/wiki/Maildir">maildir</a> format!<br /><br /><span style="font-size:85%;"><span style="font-family:courier new;">--- start file: <a href="http://www.shermanloan.com/%7Ealank/train-spam.sh">train-spam.sh</a> ---</span> <span style="font-family:courier new;"><br />#!/bin/sh<br /><br /># train-spam.sh<br />#<br /># Description: Checks each user's /home/Maildir/.Spam.Missed<br /># directories to see if the user placed any "missed" spam<br /># messages which got through SpamAssassin to their INBOX.<br /># If there are messages in this directory, then the script<br /># invokes sa-learn to update the site-wide tokens to try<br /># and improve the defenses for next time...<br />#<br /><br /># learn_spam - Function which takes a directory and a user as<br /># arguments, and then feeds that directory to our anti-spam<br /># applications for further SPAM training.<br />#<br /># Arguments:<br /># $1 - Directory name containing SPAM emails. Required<br /># $2 - User name. If it is not provided, $USER will be used.<br />#<br /># Example:<br /># learn_spam /home/alank/Maildir/.Spam.Missed/cur alank<br />#<br />function learn_spam {<br /><br /> # loop through all emails in given directory<br /> for email in $(ls $1); do<br /><br /> # process SPAM email using DSPAM<br /> /usr/local/bin/dspam --mode=teft --source=error --class=spam --feature=chained,noise --user $2 < $1/$email<br /> echo -n "."<br /><br /> # delete SPAM email<br /> rm $1/$email<br /><br /> done # end of email loop<br /><br />} # end function learn_spam<br /><br /># learn_ham - Function which takes a directory and a user as<br /># arguments, and then feeds that directory to our anti-spam<br /># applications for further HAM training.<br />#<br /># Arguments:<br /># $1 - Directory name containing HAM emails. Required<br /># $2 - User name. If it is not provided, $USER will be used.<br />#<br /># Example:<br /># learn_ham /home/alank/Maildir/.Spam.NotSpam/cur alank<br />#<br />function learn_ham {<br /><br /> # loop through all emails in given directory<br /> for email in $(ls $1); do<br /><br /> # process HAM email using DSPAM<br /> /usr/local/bin/dspam --mode=teft --source=error --class=innocent --feature=chained,noise --user $2 < $1/$email<br /> echo -n "."<br /><br /> # delete HAM<br /> rm $1/$email<br /><br /> done # end of email loop<br /><br />} # end function learn_ham<br /><br />#<br /># Script starts here!<br />#<br /><br /># loop through all user home directories<br />for file in $(ls /home); do<br /><br /> # if there is a Spam/Missed maildir<br /> if [ -d /home/$file/Maildir/.Spam.Missed/cur ]; then<br /> <br /> # then process any missed SPAM<br /> echo -n "missed spam for $file: "<br /> learn_spam /home/$file/Maildir/.Spam.Missed/cur $file<br /> learn_spam /home/$file/Maildir/.Spam.Missed/new $file<br /> echo ""<br /> <br /> fi # end if<br /><br /> # if there is a Spam/NotSpam dir<br /> if [ -d /home/$file/Maildir/.Spam.NotSpam/cur ]; then<br /> <br /> # then process any falsely identified spam, i.e. HAM<br /> echo -n "false positives for $file: "<br /> learn_ham /home/$file/Maildir/.Spam.NotSpam/cur $file<br /> learn_ham /home/$file/Maildir/.Spam.NotSpam/new $file<br /> echo ""<br /><br /> fi # end if<br /><br />done # end for loop<br /><br />echo "Done!"<br />--- end file: </span></span><span style="font-size:85%;"><span style="font-family:courier new;"><a href="http://www.shermanloan.com/%7Ealank/train-spam.sh">train-spam.sh</a></span></span><span style="font-size:85%;"><span style="font-family:courier new;"> ---</span><br /><br /></span>I place the above script in <span style="font-family:courier new;">/root</span> and create a <a href="http://en.wikipedia.org/wiki/Crontab">cron</a> job to run it every day in the early morning. You will need to edit some parts of the script if your missed spam and not spam directories are named differently. Good luck, and I hope it is helpful in your continuing battle against spam!Alanhttp://www.blogger.com/profile/00999861302655014098noreply@blogger.com