Re: [drupal-devel] Dealing with spam (was rel=nofollow)

24 Jan 2005

      ...
That was ecactly my concern and idea. I found that on all my sites that were 
hit by spam (the five sites I maintain) the spam was exactly the same. My 
most popular of the five was hit first. But the other four had to be learned 
to handle the *exact same spam messages*.
I beleive sites like sfx can me considered as "master brains", since they 
learn trought a much bigger educating audience, but also becuase they are 
much more interesting spam targets. I did not thing of spreading things like 
regexps etc. Merely about tokens etc.
I was doing some 'related' work the last couple of days. Granted it is
not spam filtering, but it is about learning to classify text.
Having a network of trusted RSS/RDF feeds will be great to propagate
'learnable' things, in this particular case spam. It can be even better
if we have source tracking, and some content_id, which is unique for the
whole network. Think of the chain
site 1 RSS feed -> site 2 RSS feed -> site 3 RSS feed -> site 1.

Such a network will surely be diverse enough, to confuse a generator,
but interconnected enough to be efficient.

I was doing some thinking on the Bayesian + regex/custom filter strategy
spam.module uses. It would be really good to mod the learning and
rating algorithms in order to account for the immediacy of the regex
short term, and the effect it has on the learned spam. Does anyone have
any statistics on this?

Cheers,
Vlado

-- 
Vladimir Zlatanov <vlado@dikini.net>