[development] a bikeshed color problem

Bèr Kessels ber at webschuur.com
Sat Sep 23 21:11:57 UTC 2006


Op vrijdag 22 september 2006 05:11, schreef Augustin (Beginner):
> > If all Drupal web sites were collaborating on gathering useful data, and
> > passing on this data to relevant organizations, we might collectively
> > achieve something.
> > One spam report against one IP might achieve nothing, but a concerted
> > effort to systematically denounce bad IPs might force people to take
> > positive actions.
> >
> > I really don't know how such a thing could be organized. One has to study
> > first how organizations fighting spam and organizations setting up
> > blacklists operate.

I used to publish my/our spam export lists from our host. What I did was 
simple: pipe all the ip addresses and all the blocked domains from mysql db 
into a textfile and have that file online. It was downloaded exactly 124 
times, about a 100 times by bots, at least 4 times by me (tests). Wich leaves 
20 interested people in this data. 

Spam.module has an import export function which I used several times and I 
must say that it works. People will argue that it won't work, but I can 
assure: If you have a starter for all the bayesian tokens, your at least five 
weeks of training (on an average blog) ahead. As opposed to not having that 
starters. Spam.module comes (or used to, I haven't checked in a while) with 
sqldumps to fill your filters.

Right. How about a distribution system for this? Lets say I ping over XMLRPC 
to a flock of, five, six sites, if they have new tokens, IPS etc. If they do, 
I upgrade my database with what (some of) these sites have learned. "Together 
we learn a lot more". Each of these five, six sites do the same. This 
exponential network enables you to get huge amounts of spammer data with each 

Now. Consider me being a smart spammer (I still need to upgrade my CV one day) 
and I actually know of this P2P system to upgrade eachothers tokens. In fact, 
I am that smart (I really need to write that CV) that I know how to reverse 
engineer those tokens. I learn, for example, that bikshed is bounced as a 
word. I then use this datamine to upgrade my spamming techniques, and write 
out mails that no longer contain the words bikeshed or any color known in the 

Basically, I, as smart spammer can use that 'data mine' just as well as anyone 

So before we can use such a ring/flock/group/p2p upgrade system, we need to 
find a way to sort out trust. Options I see right now are GPG/PGP keyrings, 
Ebay-alike trust ratings, or ability to define the people whom can access 
your datamine. 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
Url : http://lists.drupal.org/pipermail/development/attachments/20060923/0eb6d058/attachment.pgp

More information about the development mailing list