[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[vps-mail] Issues re Spamassassin & Bayes
- Subject: [vps-mail] Issues re Spamassassin & Bayes
- From: Abigail Marshall <webmaster@xxxxxxxxxxxx>
- Date: Fri, 26 Sep 2003 16:17:09 -0700
Scott, I have had the following exchange on the spamassassin
mailing list regarding issues I have had with Bayes.
Because it could affect all VPS users, I am setting forth
everything in full. Basically, the issue appears to be in
the way the VPS handles the database required for SA/Bayes.
I've figured out how to keep things working on my VPS, by
the way - my own hack seems to work, but this may not hold
true with an upgrade to version 2.6.
Note: I determined the version of Berkeley DB by running the
command:
% file ~/.spamassassin/bayes_toks*
Results: Berkeley DB 1.85 (Hash, version 2, native byte-order)
1) My post:
From: Abigail Marshall <abigail@xxxxx>
Bayes configuration questions
2003-09-24 20:19
Here's the issue:
System: Running SA 2.54, FreeBSD Unix, Berkeley DB 1.85
(Hash, version 2):
Problem: When bayes_toks grows to more than 5K, it becomes
corrupted during sa-learn and ultimately trashed or lost.
My solution: Set bayes_expiry_max_db_size to lower level to
force expiry, so that bayes_toks doesn't grow too large.
I did not make changes to configuration to
bayes_expiry_min_db_size or to bayes_expiry_scan_count
Questions/Problems:
1. Why did bayes_toks grow to more than 5k in the first
place?
Documentation for SA 2.5x (sa-learn.html) says:
> Once it hits 5000 bytes, the bayes_toks database is
> locked, and the message counter entry in that database is
> increased accordingly.
2. What is default configuration for
bayes_expiry_max_db_size for SA 2.5x and how large should
the resulting file be?
Through experimentation, I have ended up with a setting:
bayes_expiry_max_db_size 150000
With this setting, bayes_toks never gets any larger than
2,556 kb.
According to documentation for SA 2.6 (sa-learn.txt)
> "bayes_expiry_max_db_size" specifies both the auto-expire token count
> point, as well as the resulting number of tokens after expiry as
> described above. The default value is 150,000, which is roughly
> equivalent to a 6Mb database file if you're using DB_File.
Note that my setting is the SAME as the default for 2.6 -
but rather than a 6Mb db file, I end up with a 2.5 Mb file,
with a Bayes corpus of ~2800 or less.
Documentation for SA 2.5 (sa-learn.html) says:
> bayes_expiry_min_db_size is part of the SpamAssassin
> configuration. The default value is 100000, which is
> roughly equivalent to a 5Mb database file if you're using
> DB_File.
So here is where I am totally confused: from what I can
tell, my setting of bayes_expiry_max_db_size=150000 should
either have no effect whatsoever, or it should leave me with
a bayes_toks file that will grow to 5K - and I end up with a
file half that size.
NOTE: Bayes works fine for me this way, but my guess is that
with the small corpus and short expiry cycle I may see
erratic performance over time. Because my system seems to
get much more spam than ham, I end up with a 9:2
ratio of spam to ham as autolearn continues to feed incoming
email to the database.
But my main concern right now is trying to figure out why
my experience doesn't match what the documentation says I
should expect.
-Abigail
2) Response #1
From: David B Funk <dbfunk@xxxxx>
Re: Bayes configuration questions
2003-09-24 21:59
On Wed, 24 Sep 2003, Abigail Marshall wrote:
> Here's the issue:
>
> System: Running SA 2.54, FreeBSD Unix, Berkeley DB 1.85
> (Hash, version 2):
[snip..]
> -Abigail
Abigail,
Where the heck did you manage to find V1.85 of the Berkeley DB kit?
It's ancient and buggy ( I was using it a decade ago with sendmail v6 ;)
Get a reasonably modern version of Berkeley DB, say at least v3.3
or better.
Berkeley DB should have -no- trouble handling Bayes databases in the
multi-megabyte size with 100,000'ds of tokens.
Heck, my bayes_journal file usually runs about 50k bytes. (of course
that's with 10k-20k messages per day, so it does a 0-100k
roll-over every 10 minutes).
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
3) Response #2:
Email Archive: spamassassin-talk (read-only)
From: <jm@xxxxx> (Justin Mason)
Re: Bayes configuration questions
2003-09-24 23:13
David B Funk writes:
>On Wed, 24 Sep 2003, Abigail Marshall wrote:
>
>> Here's the issue:
>>
>> System: Running SA 2.54, FreeBSD Unix, Berkeley DB 1.85
>> (Hash, version 2):
>[snip..]
>> -Abigail
>
>Abigail,
>Where the heck did you manage to find V1.85 of the Berkeley DB kit?
>It's ancient and buggy ( I was using it a decade ago with sendmail v6 ;)
Aha -- that explains it. David's right, Berkeley DB 1.85 is
very old, and has had problems noted before (in other products,
at least). IMO that's the most likely cause of the problem...
PS: I was also thinking disk quotas, but this is much more likely.
--j.
======================================================================
Technical questions regarding this list may be sent to
<vps-mail-owner@xxxxxxxxxxxx>. You may request an automated help
response by sending an email with the word 'help' (w/o quotes) in the
BODY of the message (subject is ignored) to <vps-mail-request@xxxxxxxxxxxx>.
======================================================================
Main Index |
Thread Index