On Mon, 2005-01-24 at 08:24 -0500, Jeremy Andrews wrote:
On Mon, 24 Jan 2005 13:08:46 +0000 Vladimir Zlatanov <vlado@dikini.net> wrote:
[...]
I was doing some thinking on the Bayesian + regex/custom filter strategy spam.module uses. It would be really good to mod the learning and rating algorithms in order to account for the immediacy of the regex short term, and the effect it has on the learned spam. Does anyone have any statistics on this?
Can you explain what you mean by this? I don't understand what you're proposing. Sorry
facts: Custom filters are immediate - equivalent to true or false. Bayesian filter accumulates evidence What I propose is to ammend the learner by using something like: if the custom filter says BAD and me not this is an error, so I need to learn it. This way the learner will change its behaviour to accomodate the new BAD thing into its statistics. There is a second possible addition to use a simple 'meta'-evaluator, which uses the results of all filters - beayesian and others to judge the content. This way it can change the weight of individual filters with time, so certain filters might expire. Such an evaluator 'theoretically' has the potential to improve the overall model, without adding a significant performance cost. Jeremy, I have a suggestion to change a bit the code of the Baeysian filter, do you want me to post is as a patch/feature or send you an email. It is not ready as a patch at the moment - it is part of that classifier I was mumbling about a month ago, but it might(tm) speed up the evaluation of the spam probability. Can't benchmark it properly at the moment. Cheers, Vlado -- Vladimir Zlatanov <vlado@dikini.net>