Re: [drupal-devel] Dealing with spam (was rel=nofollow)

24 Jan 2005

      On Mon, 2005-01-24 at 08:24 -0500, Jeremy Andrews wrote:
...
On Mon, 24 Jan 2005 13:08:46 +0000
Vladimir Zlatanov <vlado@dikini.net> wrote:
[...]
...
I was doing some thinking on the Bayesian + regex/custom filter strategy
spam.module uses. It would be really good to mod the learning and
rating algorithms in order to account for the immediacy of the regex
short term, and the effect it has on the learned spam. Does anyone have
any statistics on this?
Can  you explain what you mean by this?  I don't understand what you're
proposing.
Sorry
facts:
Custom filters are immediate - equivalent to true or false.
Bayesian filter accumulates evidence

What I propose is to ammend the learner by using something like:
  if the custom filter says BAD and me not this is an error, so I need
  to learn it. This way the learner will change its behaviour to
  accomodate the new BAD thing into its statistics.

There is a second possible addition to use a simple 'meta'-evaluator,
which uses the results of all filters - beayesian and others to judge
the content. This way it can change the weight of individual filters
with time, so certain filters might expire. Such an evaluator
'theoretically' has the potential to improve the overall model, without
adding a significant performance cost. 

Jeremy, I have a suggestion to change a bit the code of the Baeysian
filter, do you want me to post is as a patch/feature or send you an 
email. It is not ready as a patch at the moment - it is part of 
that classifier I was mumbling about a month ago, but it might(tm)
speed up the evaluation of the spam probability. Can't benchmark it
properly at the moment.

Cheers,
Vlado 

-- 
Vladimir Zlatanov <vlado@dikini.net>