[drupal-devel] Dealing with spam (was rel=nofollow)

Vladimir Zlatanov vlado at dikini.net
Mon Jan 24 14:02:06 UTC 2005


On Mon, 2005-01-24 at 08:24 -0500, Jeremy Andrews wrote:
> On Mon, 24 Jan 2005 13:08:46 +0000
> Vladimir Zlatanov <vlado at dikini.net> wrote:
> 
> [...]
> 
> > I was doing some thinking on the Bayesian + regex/custom filter strategy
> > spam.module uses. It would be really good to mod the learning and
> > rating algorithms in order to account for the immediacy of the regex
> > short term, and the effect it has on the learned spam. Does anyone have
> > any statistics on this?
> 
> Can  you explain what you mean by this?  I don't understand what you're
> proposing.
Sorry

facts:
Custom filters are immediate - equivalent to true or false.
Bayesian filter accumulates evidence

What I propose is to ammend the learner by using something like:
  if the custom filter says BAD and me not this is an error, so I need
  to learn it. This way the learner will change its behaviour to
  accomodate the new BAD thing into its statistics.

There is a second possible addition to use a simple 'meta'-evaluator,
which uses the results of all filters - beayesian and others to judge
the content. This way it can change the weight of individual filters
with time, so certain filters might expire. Such an evaluator
'theoretically' has the potential to improve the overall model, without
adding a significant performance cost. 

Jeremy, I have a suggestion to change a bit the code of the Baeysian
filter, do you want me to post is as a patch/feature or send you an 
email. It is not ready as a patch at the moment - it is part of 
that classifier I was mumbling about a month ago, but it might(tm)
speed up the evaluation of the spam probability. Can't benchmark it
properly at the moment.

Cheers,
Vlado 

-- 
Vladimir Zlatanov <vlado at dikini.net>




More information about the drupal-devel mailing list