[drupal-devel] spam.module suggested filter improvement

Vladimir Zlatanov vlado at dikini.net
Mon Jan 24 14:13:25 UTC 2005

> Jeremy, I have a suggestion to change a bit the code of the Baeysian
> filter, do you want me to post is as a patch/feature or send you an 
> email. It is not ready as a patch at the moment - it is part of 
> that classifier I was mumbling about a month ago, but it might(tm)
> speed up the evaluation of the spam probability. Can't benchmark it
> properly at the moment.

What the hell, I think it is better to paste the code, it might need 
more work on it. 

function _naiveBayes($tokens) {
  $probs = array();
  $drift = variable_get('spam_min_drift', 40);
  $max = variable_get('spam_max_tokens', 40);

  //a rewrite of the original - it should not reduce the quality of the
  //predictions, while it has the potential to increase speed. Speed
  //difference depends on the speed of execution of asort() and the
  //number of interesting tokens considered the added comparison and  
  //evaluation in SQL
  //$drift - minimal drift from the median - a soft-ish shoulder of the
  //         filter
  //$max - maximum number of tokens to evaluate, a hard shoulder of the
  //         filter

  foreach($tokens as $token){
    //1. may be beneficial not to include $max or make it larger
    //2. the sql syntax should be relatively portable to postgres,
    // but haven't checked the details
    $result = db_query("SELECT probability FROM {spam_tokens} WHERE
token='%s' AND (ABS(50 - probablility) >= %d) AND last >= %d SORT BY
(ABS(50 - probability) LIMIT %d)", $token,$drift, $max);
      if( $p->probability ) {
        $p->probability = variable_get('spam_unknown_probability',40);
      $probs += $p->probability;

  $rating = ($probs + $weight) / $num;
  if ($rating > 99)
    $rating = 99;
  else if ($rating < 1)
    $rating = 1;

  return $rating;

Vladimir Zlatanov <vlado at dikini.net>

More information about the drupal-devel mailing list