Re: [drupal-devel] Dealing with spam (was rel=nofollow)

24 Jan 2005

      On Mon, 24 Jan 2005, Vladimir Zlatanov wrote:

[...]
...
facts:
Custom filters are immediate - equivalent to true or false.
Bayesian filter accumulates evidence
Not a complete fact.  There are four possible settings "always spam == 
true", "usually spam == probably true", "usually not spam == probably 
false", "never spam == false".
...
What I propose is to ammend the learner by using something like:
 if the custom filter says BAD and me not this is an error, so I need
 to learn it. This way the learner will change its behaviour to
 accomodate the new BAD thing into its statistics.
This would be simple to do.  Certain events could force the filter into 
TEFT mode for a given spam.  ie, if matching "always spam" or "never 
spam".  In particular, I like this as it would auto-train the URL filter. 
I've given this thought already, and it's one of several improvements I 
have planned.
...
There is a second possible addition to use a simple 'meta'-evaluator,
which uses the results of all filters - beayesian and others to judge
the content. This way it can change the weight of individual filters
with time, so certain filters might expire. Such an evaluator
'theoretically' has the potential to improve the overall model, without
adding a significant performance cost.
Currently the module simply combines the total of all filter methods, and 
decided whether or not a given message is spam based on that total.  I 
understand your proposal to adjust a given filter method if it's in 
disagreement to the overall overage of all filter methods.  However, I'm 
not sure if this is realistic.  The filter methods aren't really 
compatible -- it's perfectly normal for one filter method to suggest a 
message is not spam, and for anothe rfilter method to suggest it's not 
spam.

(For example, if a message doesn't have any URLs, the URL filter and the 
URL counter will always say this is probably not spam.  That is correct 
for these filter methods.)
...
Jeremy, I have a suggestion to change a bit the code of the Baeysian
filter, do you want me to post is as a patch/feature or send you an
email. It is not ready as a patch at the moment - it is part of
that classifier I was mumbling about a month ago, but it might(tm)
speed up the evaluation of the spam probability. Can't benchmark it
properly at the moment.
The best thing to do is to open an issue in the spam project.  I know you 
have already emailed the idea, but please still open an issue.  I will be 
busy the next few days, and by opening an issue you can be sure I don't 
forget to look at this...

Thanks,
  -Jeremy

Re: [drupal-devel] Dealing with spam (was rel=nofollow)

Jeremy Andrews