[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [vps-mail] Bayesian AND BlackLists
- Subject: RE: [vps-mail] Bayesian AND BlackLists
- From: "Jim Smith" <maillists@xxxxxxxxxxxxxxxx>
- Date: Thu, 18 Dec 2003 07:07:56 -0500
Thanks for the link to www.spamassassin.org/doc/sa-learn.html. I also wasn't
able to find it by man sa-learn on my VPS2 so that helped.
Now for the follow-up questions after I RTFM :). To gather a thousand spams
won't be difficult at all (a day or two will do it). However gathering a
thousand legit emails may be more difficult. One concern I have is that
someone mentioned that headers are important to the bayes filtering. If I
get a few legit emails, read them, I usually move them to a folder on my PC
(using OutLuck). If I forward them or resend them to another folder on the
server, that will change the headers, thus skewing the results, right? If I
opt to leave a copy on the server, then go in through iManager to move them
to another folder won't that also leave a mark on the headers?
Excluding all my mail list messages (those would really confuse the filters
and aren't on my main email address anyway), I only get a dozen or two legit
emails a day mixed in with my thousand spams so I'm looking at a long time
before being able to run these filters. But from what everyone is saying
about the effectiveness of adding the bayes, I'm sure it will be worth it. I
just don't want to go through several weeks of accumulation only to find
that I've done it wrong.
So, what's the recommended way to separate the emails without "tainting" the
headers?
Thanks,
Jim Smith
--------------------------------------------------------
Jim Smith, Blarneystone, LLC.
Website Design, Hosting, Development & Enhancement
E-MAIL: jimsmith@xxxxxxxxxxxxxxxx
WEBSITE: http://www.blarneystone.com
--------------------------------------------------------
> -----Original Message-----
> From: owner-vps-mail@xxxxxxxxxxxx [mailto:owner-vps-mail@xxxxxxxxxxxx]On
> Behalf Of Sikaspam
> Sent: Monday, December 15, 2003 1:47 AM
> To: vps-mail@xxxxxxxxxxxx
> Subject: Re: [vps-mail] Bayesian AND BlackLists
>
>
> Jim Smith wrote:
> > required_hits 5.0
> Wow, 5 is really low! This may result in a lot of false positives.
> >
> > I'm not sure if Bayes is working or not; I didn't see a
> reduction in missed
> > spam. Here are my follow-up questions:
> Look at the spamassassin headers. You will see the scores.
> > 1. What do I put in here to have the Bayes filtered items
> separated from the
> > other Spamassassin stuff?
> You could do that with procmail by reading the headers.
> > 2. Is there something else that I need to install (such as
> Bayes?) on the
> > server in order for Bayes to work?
> No, but it doesn't work until you train it and then it requires a few
> hundred messages I think. See below.
> > 3. Is there a site that I can read up on Bayes? I'm not clear
> how it works
> > but I'm assuming it compares the email to a database of some
> sort. I don't
> > know what the "auto learn" is but wouldn't mind 'teaching' it if that is
> > worthwhile.
>
> Why not start by looking at the documentation for spamassassin? It
> explains in detail how to train the Bayes filter. If you do a search on
> Bayesian filter or something like that you will find the original paper
> that explains it and even gives the math. You can also look up popfile
> which is another proxy that uses Bayes and probably has an easier to
> understand text about the philosophy.
>
> hth
> ======================================================================
> This is <vps-mail@xxxxxxxxxxxx> <http://www.perlcode.org/lists/>
> ======================================================================
======================================================================
This is <vps-mail@xxxxxxxxxxxx> <http://www.perlcode.org/lists/>
Before posting a question, please search the archives (see above URL).
Main Index |
Thread Index