These parts are of particular interest to me:
What I don't like about exchanging hard rules is that you will end up with a unified database of urls, regexes, whatever... Hard rules, means clever peolple will find a way around them - just look at the progress of the email spam fight.
In the short run the regex exchange is going to work. Its downside is the downside of every overtrained learning algorithm. It looses precision big time, especialy in an evolving system. It does not adapt well.
A brief comparison: regex exchange - surgery; content exchange between learning systems - holistic medicine.
I was thinking some more about these adaptive filters that learn and the sharing of spam posts/comments between sites. It seems to be like the p2p model would work really well in this case. Just as we had proposed doing a sort of "trusted sites" system where you exchange regexps with your trusted network of sites (which, to my thinking, seems inherently *more* risky than just dealing with spam locally), the way that we could leverage spammers' own behavior would be to syndicate an unpublished feed between trusted sites of ONLY spam content. I mean, that way, if someone taps your "spam feed", who cares?! BUT, if Drupal can tap into Spread Firefox's spam.rss feed, Drupal's learning filters would learn that much faster -- and in real time, because as SFX gets hit, Drupal would have the content from the feed which has been pre-identified as spam. Thus when and if the spammer moves to Drupal, Drupal will be one step ahead of them, having learned from SFX. Thoughts? Chris