[drupal-devel] Dealing with spam (was rel=nofollow)

Vladimir Zlatanov vlado at dikini.net
Mon Jan 24 13:05:36 UTC 2005


> That was ecactly my concern and idea. I found that on all my sites that were 
> hit by spam (the five sites I maintain) the spam was exactly the same. My 
> most popular of the five was hit first. But the other four had to be learned 
> to handle the *exact same spam messages*. 
> 
> I beleive sites like sfx can me considered as "master brains", since they 
> learn trought a much bigger educating audience, but also becuase they are 
> much more interesting spam targets. I did not thing of spreading things like 
> regexps etc. Merely about tokens etc.

I was doing some 'related' work the last couple of days. Granted it is
not spam filtering, but it is about learning to classify text.
Having a network of trusted RSS/RDF feeds will be great to propagate
'learnable' things, in this particular case spam. It can be even better
if we have source tracking, and some content_id, which is unique for the
whole network. Think of the chain
site 1 RSS feed -> site 2 RSS feed -> site 3 RSS feed -> site 1.

Such a network will surely be diverse enough, to confuse a generator,
but interconnected enough to be efficient.

I was doing some thinking on the Bayesian + regex/custom filter strategy
spam.module uses. It would be really good to mod the learning and
rating algorithms in order to account for the immediacy of the regex
short term, and the effect it has on the learned spam. Does anyone have
any statistics on this?

Cheers,
Vlado
               
-- 
Vladimir Zlatanov <vlado at dikini.net>




More information about the drupal-devel mailing list