[drupal-devel] Dealing with spam (was rel=nofollow)
Jeremy Andrews
jandrews at kerneltrap.org
Mon Jan 24 15:22:05 UTC 2005
On Mon, 24 Jan 2005, Vladimir Zlatanov wrote:
[...]
> facts:
> Custom filters are immediate - equivalent to true or false.
> Bayesian filter accumulates evidence
Not a complete fact. There are four possible settings "always spam ==
true", "usually spam == probably true", "usually not spam == probably
false", "never spam == false".
> What I propose is to ammend the learner by using something like:
> if the custom filter says BAD and me not this is an error, so I need
> to learn it. This way the learner will change its behaviour to
> accomodate the new BAD thing into its statistics.
This would be simple to do. Certain events could force the filter into
TEFT mode for a given spam. ie, if matching "always spam" or "never
spam". In particular, I like this as it would auto-train the URL filter.
I've given this thought already, and it's one of several improvements I
have planned.
> There is a second possible addition to use a simple 'meta'-evaluator,
> which uses the results of all filters - beayesian and others to judge
> the content. This way it can change the weight of individual filters
> with time, so certain filters might expire. Such an evaluator
> 'theoretically' has the potential to improve the overall model, without
> adding a significant performance cost.
Currently the module simply combines the total of all filter methods, and
decided whether or not a given message is spam based on that total. I
understand your proposal to adjust a given filter method if it's in
disagreement to the overall overage of all filter methods. However, I'm
not sure if this is realistic. The filter methods aren't really
compatible -- it's perfectly normal for one filter method to suggest a
message is not spam, and for anothe rfilter method to suggest it's not
spam.
(For example, if a message doesn't have any URLs, the URL filter and the
URL counter will always say this is probably not spam. That is correct
for these filter methods.)
> Jeremy, I have a suggestion to change a bit the code of the Baeysian
> filter, do you want me to post is as a patch/feature or send you an
> email. It is not ready as a patch at the moment - it is part of
> that classifier I was mumbling about a month ago, but it might(tm)
> speed up the evaluation of the spam probability. Can't benchmark it
> properly at the moment.
The best thing to do is to open an issue in the spam project. I know you
have already emailed the idea, but please still open an issue. I will be
busy the next few days, and by opening an issue you can be sure I don't
forget to look at this...
Thanks,
-Jeremy
More information about the drupal-devel
mailing list