[drupal-devel] Dealing with spam (was rel=nofollow)

Michael Jervis mike at fuckingbrit.com
Fri Jan 21 07:48:55 UTC 2005


------ 
If this is already being done, perhaps the easiest thing would be for
someone to write a parser that goes out and grabs this list every so often,
parses it, and updates the custom filters and URLfilters as necessary...
This begins to meet the earlier demands for sharing among projects.
------

Sounds like a good plan to me. I'll take a look at this once I've migrated
if no-one else is interested?

I've been thinking, I intend to implement this as a new module that requires
the spam module and works with it to keep the code modular and separate. I
think this is the right way to go as it allows the functionality to be
totally removed if people don't want it.

I've had a swift first look at the spam schema, I'm thinking these need to
go into the custom table as regexes, but also need marking as MT items for
management. I'd add an itemtype column or some such, and insert MT for
moveable type items. This should make it all transparent to the spam plugin,
but allow the MT Blacklist plugin to know which items it shouldn't touch.

I think the first iteration should provide:

1) Manually triggered one shot, load the master list.
2) Cron managed periodic update of additions and deletions.
3) Manual trigger of 2.

After this, I find the idea of the web of trust push system interesting, so
I'd like to add hooks for when items are added, or perhaps on cron, pushing
out any new items to any trusted sites. This helps people with multiple
instances do their admin once, so I think it's worthwhile for that, if not
for the original reasons for inventing it. Or perhaps WOT pull...

FYI: MT masterlist import (GPL From geeklog's spamx):
    /**
     * Import the blacklist
     * 
     * @param array $lines The blacklist
     * @return int number of lines imported
     */
    function _do_import ($lines)
    {
        global $_TABLES;

        $count = 0;
        foreach ($lines as $line) {
            $l = explode ('#', $line);
            $entry = trim ($l[0]);
            if (!empty ($entry)) {
                DB_query ('INSERT INTO ' . $_TABLES['spamx']
                     . ' VALUES ("MTBlacklist","' . addslashes ($entry)
                     . '")');
                $count++;
            } 
        } 

        return $count;
    } 
Likewise: importing updates:
/**
     * Update MT Blacklist from RSS feed
     */
    function _update_blacklist ()
    {
        global $_CONF, $_TABLES, $LANG_SX00, $_SPX_CONF;

        require_once($_CONF['path'] .
'plugins/spamx/magpierss/rss_fetch.inc');
        require_once($_CONF['path'] .
'plugins/spamx/magpierss/rss_utils.inc');

        $rss = fetch_rss($_SPX_CONF['rss_url']); 
        // entries to add and delete, according to the blacklist changes
feed
        $to_add = array();
        $to_delete = array();

        foreach($rss->items as $item) {
            // time this entry was published (currently unused)
            // $published_time = parse_w3cdtf( $item['dc']['date'] );
            $entry = substr($item['description'], 0, -3); // blacklist entry
            $subject = $item['dc']['subject']; // indicates addition or
deletion
             
            // is this an addition or a deletion?
            if (strpos($subject, 'addition') !== false) {
                // save it to database
                $result = DB_query('SELECT * FROM ' . $_TABLES['spamx'] . '
WHERE name="MTBlacklist" AND value="' . $entry . '"');
                $nrows = DB_numRows($result);
                if ($nrows < 1) {
                    $result = DB_query('INSERT INTO ' . $_TABLES['spamx'] .
' VALUES ("MTBlacklist","' . $entry . '")');
                    $to_add[] = $entry;
                } 
            } else if (strpos($subject, 'deletion') !== false) {
                // delete it from database
                $result = DB_query('SELECT * FROM ' . $_TABLES['spamx'] . '
where name="MTBlacklist" AND value="' . $entry . '"');
                $nrows = DB_numRows($result);
                if ($nrows >= 1) {
                    $result = DB_query('DELETE FROM ' . $_TABLES['spamx'] .
' where name="MTBlacklist" AND value="' . $entry . '"');
                    $to_delete[] = $entry;
                } 
            } 
        } 
        $display = '<hr><p><b>' . $LANG_SX00['entriesadded'] .
'</b></p><ul>';
        foreach ($to_add as $e) {
            $display .= "<li>$e</li>";
        } 
        $display .= '</ul><p><b>' . $LANG_SX00['entriesdeleted'] .
'</b></p><ul>';
        foreach ($to_delete as $e) {
            $display .= "<li>$e</li>";
        } 
        $display .= '</ul>';
        SPAMX_log($LANG_SX00['uMTlist'] . $LANG_SX00['uMTlist2'] .
count($to_add) . $LANG_SX00['uMTlist3'] . count($to_delete) .
$LANG_SX00['entries']);

        return $display;
    } 

Just in case anyone gets to it before me, to avoid re-invention of wheels.

Any input?

Mike
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3030 bytes
Desc: not available
Url : http://drupal3.drupal.org/pipermail/development/attachments/20050121/3f35882c/smime.bin


More information about the drupal-devel mailing list