[drupal-devel] Email filter
Hi all, I've just committed an emailfilter module which interprets a node as a plain text email. Depending on its configuration, it will attempt to indent and colorise threaded conversations, convert URLs into anchors and remove any extra linebreaks. It will also convert &, <, > etc. into HTML entities. You can see an example of its use here: http://www.planetcocoon.com/ I'm using Drupal to collect email traffic to the Apache Cocoon mailing lists in an attempt to render it more attractively. On another related topic, does anyone know how to tweak the mailhandler and listhandler to deal with the non-ASCII characters (such as ø, etc.) that come through in the usernames, subject lines and message bodies? At the moment these characters are all converted to ?. I've tried some experiments with drupal_convert_to_utf8() but I've had no luck so far. As this is my first committed module, I'd appreciate some review; any feedback that people may have to offer would be gratefully received. Regards, Mark
Your site is down... atal error: Call to undefined function: drupal_set_html_head() in /var/www/html/drupal-cvs-2005-03-12/modules/geshicodefilter/ GeSHicodefilter.module on line 21 Are you using an old theme?? Stefan. Op 13-mrt-05 om 13:05 heeft Mark Leicester het volgende geschreven:
Hi all,
I've just committed an emailfilter module which interprets a node as a plain text email. Depending on its configuration, it will attempt to indent and colorise threaded conversations, convert URLs into anchors and remove any extra linebreaks. It will also convert &, <, > etc. into HTML entities.
You can see an example of its use here: http://www.planetcocoon.com/
I'm using Drupal to collect email traffic to the Apache Cocoon mailing lists in an attempt to render it more attractively.
On another related topic, does anyone know how to tweak the mailhandler and listhandler to deal with the non-ASCII characters (such as ø, etc.) that come through in the usernames, subject lines and message bodies? At the moment these characters are all converted to ?. I've tried some experiments with drupal_convert_to_utf8() but I've had no luck so far.
As this is my first committed module, I'd appreciate some review; any feedback that people may have to offer would be gratefully received.
Regards, Mark
Hi Stefan, I forgot to mention this is a bleeding edge 4.6 installation - the geshi code filter seems to suffer from the same issue as is mentioned here: http://drupal.org/node/10317. It's not consistent. I'm using the CVS copy of the phptemplate engine with the CVS copy of Kubrick. In this case I'm sure it's the geshi filter that needs updating - any tips? Mark On 13 Mar 2005, at 12:12, Stefan Nagtegaal wrote:
Your site is down... atal error: Call to undefined function: drupal_set_html_head() in /var/www/html/drupal-cvs-2005-03-12/modules/geshicodefilter/ GeSHicodefilter.module on line 21
Are you using an old theme??
Stefan.
Op 13-mrt-05 om 13:05 heeft Mark Leicester het volgende geschreven:
Hi all,
I've just committed an emailfilter module which interprets a node as a plain text email. Depending on its configuration, it will attempt to indent and colorise threaded conversations, convert URLs into anchors and remove any extra linebreaks. It will also convert &, <, > etc. into HTML entities.
You can see an example of its use here: http://www.planetcocoon.com/
I'm using Drupal to collect email traffic to the Apache Cocoon mailing lists in an attempt to render it more attractively.
On another related topic, does anyone know how to tweak the mailhandler and listhandler to deal with the non-ASCII characters (such as ø, etc.) that come through in the usernames, subject lines and message bodies? At the moment these characters are all converted to ?. I've tried some experiments with drupal_convert_to_utf8() but I've had no luck so far.
As this is my first committed module, I'd appreciate some review; any feedback that people may have to offer would be gratefully received.
Regards, Mark
On Sun, 13 Mar 2005, Mark Leicester wrote: Hi Mark!
I've just committed an emailfilter module which interprets a node as
I suppose it will work for comments as well?
a plain text email. Depending on its configuration, it will attempt to indent and colorise threaded conversations, convert URLs into anchors and remove any extra linebreaks. It will also convert &, <,
etc. into HTML entities.
I know some Civicspace people who will thank you for this...
You can see an example of its use here: http://www.planetcocoon.com/
I'm using Drupal to collect email traffic to the Apache Cocoon mailing lists in an attempt to render it more attractively.
On another related topic, does anyone know how to tweak the mailhandler and listhandler to deal with the non-ASCII characters (such as ø, etc.) that come through in the usernames, subject lines and message bodies? At the moment these characters are all converted to ?. I've tried some experiments with drupal_convert_to_utf8() but I've had no luck so far.
I've recently exposed the utf conversion part of the xml import to the outside world. drupal_convert_to_utf8 should be able to do the job (available only in cvs/4.6rc). You'll need to find out what the original encoding was, though. I'd appreciate a patch for listhandler, if you find the time.
As this is my first committed module, I'd appreciate some review; any feedback that people may have to offer would be gratefully received.
I'll have a look. Cheers, Gerhard
Hi Gerhard, Yes, it works for comments too. I created an Email content type (in addition to the three default ones: Filtered HTML, PHP etc.) which uses only the emailfilter. I've patched the listhandler to allow you select the content type used when new content is created (also the roles that new users are given; I will post this listhandler patch shortly). I'm using cvs 4.6 for this site. If the original email was iso-8859-1, then can just call drupal_convert_to_utf8($node->body, "ISO-8859-1")? It doesn't seem to be working for me: I get an empty string back. I'll double check. Cheers, Mark On 13 Mar 2005, at 12:14, Gerhard Killesreiter wrote:
On Sun, 13 Mar 2005, Mark Leicester wrote:
Hi Mark!
I've just committed an emailfilter module which interprets a node as
I suppose it will work for comments as well?
a plain text email. Depending on its configuration, it will attempt to indent and colorise threaded conversations, convert URLs into anchors and remove any extra linebreaks. It will also convert &, <,
etc. into HTML entities.
I know some Civicspace people who will thank you for this...
You can see an example of its use here: http://www.planetcocoon.com/
I'm using Drupal to collect email traffic to the Apache Cocoon mailing lists in an attempt to render it more attractively.
On another related topic, does anyone know how to tweak the mailhandler and listhandler to deal with the non-ASCII characters (such as ø, etc.) that come through in the usernames, subject lines and message bodies? At the moment these characters are all converted to ?. I've tried some experiments with drupal_convert_to_utf8() but I've had no luck so far.
I've recently exposed the utf conversion part of the xml import to the outside world. drupal_convert_to_utf8 should be able to do the job (available only in cvs/4.6rc).
You'll need to find out what the original encoding was, though. I'd appreciate a patch for listhandler, if you find the time.
As this is my first committed module, I'd appreciate some review; any feedback that people may have to offer would be gratefully received.
I'll have a look.
Cheers, Gerhard
Mark Leicester said: [...]
I'm using cvs 4.6 for this site. If the original email was iso-8859-1, then can just call drupal_convert_to_utf8($node->body, "ISO-8859-1")? It doesn't seem to be working for me: I get an empty string back. I'll double check.
That probably means that your PHP install doesn't support any of the functions drupal_convert_to_utf8() supports. For a 4.5.2 project, I'm using the following function based on drupal_convert_to_utf8(): // everything needs to be in UTF-8 function utf8_me($data, $encoding = "ISO-8859-1") { if (function_exists('iconv')) { $out = @iconv($encoding, 'utf-8', $data); } else if (function_exists('mb_convert_encoding')) { $out = @mb_convert_encoding($data, 'utf-8', $encoding); } else if (function_exists('recode_string')) { $out = @recode_string($encoding . '..utf-8', $data); } else if ((function_exists('utf8_encode')) && ($encoding == "ISO-8859-1")) { $out = @utf8_encode($data); } return $out; } It has an added check for utf8_encode(), a standard PHP function. Unfortunately, utf8_encode() only works with ISO-8859-1. I can roll a patch for drupal_convert_to_utf8() if anyone wants this is core. -- Tim Altman
On Sun, 13 Mar 2005, Mark Leicester wrote: Hi Mark!
Yes, it works for comments too. I created an Email content type (in addition to the three default ones: Filtered HTML, PHP etc.) which uses only the emailfilter.
Makes sense, yes.
I've patched the listhandler to allow you select the content type used when new content is created (also the roles that new users are given; I will post this listhandler patch shortly).
Good. Lookign forward to it. But maybe it should rather be a mailhandler patch as per Moshe's suggestion? Listhandler depends on mailhandler anyway and can use anything what is made available there.
I'm using cvs 4.6 for this site. If the original email was iso-8859-1, then can just call drupal_convert_to_utf8($node->body, "ISO-8859-1")? It doesn't seem to be working for me: I get an empty string back. I'll double check.
Check your logs, too. Steven's comments are also very usefull. Cheers, Gerhard
At first blush, this looks like a real useful module for mail handling in Drupal. Nice work ... I wonder if it shouldn't go directly into mailhandler.module? Do you see other uses beyond prettying up inbound email? If you get the UTF8 stuff worked out, please close this issue: http://drupal.org/node/4758. Again, I think that fix should wind back into mailhandler. CODE - you want to switch on $may_cache in emailfilter_menu(). your call to _head() is happenning twice right now. See docs for hook_menu(). On Mar 13, 2005, at 7:05 AM, Mark Leicester wrote:
Hi all,
I've just committed an emailfilter module which interprets a node as a plain text email. Depending on its configuration, it will attempt to indent and colorise threaded conversations, convert URLs into anchors and remove any extra linebreaks. It will also convert &, <, > etc. into HTML entities.
You can see an example of its use here: http://www.planetcocoon.com/
I'm using Drupal to collect email traffic to the Apache Cocoon mailing lists in an attempt to render it more attractively.
On another related topic, does anyone know how to tweak the mailhandler and listhandler to deal with the non-ASCII characters (such as ø, etc.) that come through in the usernames, subject lines and message bodies? At the moment these characters are all converted to ?. I've tried some experiments with drupal_convert_to_utf8() but I've had no luck so far.
As this is my first committed module, I'd appreciate some review; any feedback that people may have to offer would be gratefully received.
Regards, Mark
Hi Moshe, Thanks for the hook_menu() pointer. Your suggestion to wind the filter into the mailhandler sounds good to me too. I'll create a patch shortly. As I mentioned earlier I've patched listhandler to allow an admin to select the content type used when new content is created from received mail. I'd like to add that to the mailhandler too: what do you think? Cheers, Mark On 13 Mar 2005, at 13:20, Moshe Weitzman wrote:
At first blush, this looks like a real useful module for mail handling in Drupal. Nice work ... I wonder if it shouldn't go directly into mailhandler.module? Do you see other uses beyond prettying up inbound email? If you get the UTF8 stuff worked out, please close this issue: http://drupal.org/node/4758. Again, I think that fix should wind back into mailhandler.
CODE - you want to switch on $may_cache in emailfilter_menu(). your call to _head() is happenning twice right now. See docs for hook_menu().
On Mar 13, 2005, at 7:05 AM, Mark Leicester wrote:
Hi all,
I've just committed an emailfilter module which interprets a node as a plain text email. Depending on its configuration, it will attempt to indent and colorise threaded conversations, convert URLs into anchors and remove any extra linebreaks. It will also convert &, <, > etc. into HTML entities.
You can see an example of its use here: http://www.planetcocoon.com/
I'm using Drupal to collect email traffic to the Apache Cocoon mailing lists in an attempt to render it more attractively.
On another related topic, does anyone know how to tweak the mailhandler and listhandler to deal with the non-ASCII characters (such as ø, etc.) that come through in the usernames, subject lines and message bodies? At the moment these characters are all converted to ?. I've tried some experiments with drupal_convert_to_utf8() but I've had no luck so far.
As this is my first committed module, I'd appreciate some review; any feedback that people may have to offer would be gratefully received.
Regards, Mark
Thanks for the hook_menu() pointer. Your suggestion to wind the filter into the mailhandler sounds good to me too. I'll create a patch shortly. As I mentioned earlier I've patched listhandler to allow an admin to select the content type used when new content is created from received mail. I'd like to add that to the mailhandler too: what do you think?
you may commit the filter stuff directly to mailhandler.module. please commit to both 4.5 and HEAD if possible. each is slightly different, unfortunately ... from your description it sounds like we don't need that listhandler patch. it can be handled with a mailhandler command. see the extended help page for mailhandler.
Cheers, Mark
On 13 Mar 2005, at 13:20, Moshe Weitzman wrote:
At first blush, this looks like a real useful module for mail handling in Drupal. Nice work ... I wonder if it shouldn't go directly into mailhandler.module? Do you see other uses beyond prettying up inbound email? If you get the UTF8 stuff worked out, please close this issue: http://drupal.org/node/4758. Again, I think that fix should wind back into mailhandler.
CODE - you want to switch on $may_cache in emailfilter_menu(). your call to _head() is happenning twice right now. See docs for hook_menu().
On Mar 13, 2005, at 7:05 AM, Mark Leicester wrote:
Hi all,
I've just committed an emailfilter module which interprets a node as a plain text email. Depending on its configuration, it will attempt to indent and colorise threaded conversations, convert URLs into anchors and remove any extra linebreaks. It will also convert &, <,
etc. into HTML entities.
You can see an example of its use here: http://www.planetcocoon.com/
I'm using Drupal to collect email traffic to the Apache Cocoon mailing lists in an attempt to render it more attractively.
On another related topic, does anyone know how to tweak the mailhandler and listhandler to deal with the non-ASCII characters (such as ø, etc.) that come through in the usernames, subject lines and message bodies? At the moment these characters are all converted to ?. I've tried some experiments with drupal_convert_to_utf8() but I've had no luck so far.
As this is my first committed module, I'd appreciate some review; any feedback that people may have to offer would be gratefully received.
Regards, Mark
Hi Moshe, I'd like to submit my mailhandler patches. I've rolled the emailfilter into mailhandler, so now mailhandler: - provides a filter to display conversation replies - deals with i18n characters in email subject, from address and body Some questions/notes: 1. I've copied comment creation code into the mailhandler module (removing the drupal_goto which causes problems). You mentioned a fix was on the way. Where can I find that? Otherwise, shall I submit it as part of my patch. 2. I use the listhandler with the mailhandler. Because the listhandler creates the users on the fly, I had to move the mailhandler_switch_user call. It now follows the foreach (module_list() as $name). Can you foresee this causing any trouble? 3. I understand what you mean about using mailhandler commands to set the content type, but I would prefer to have the radio button content type selector UI available when I edit a mailbox. Are you happy for this to happen? Being new to Drupal development I want to check that when you say "you may commit the filter stuff directly to mailhandler.module", you mean I may go ahead and commit my changes to CVS. I have committer rights, but I don't want to step on any toes by committing inappropriately! Please confirm. Regards, Mark On 14 Mar 2005, at 01:09, Moshe Weitzman wrote:
Thanks for the hook_menu() pointer. Your suggestion to wind the filter into the mailhandler sounds good to me too. I'll create a patch shortly. As I mentioned earlier I've patched listhandler to allow an admin to select the content type used when new content is created from received mail. I'd like to add that to the mailhandler too: what do you think?
you may commit the filter stuff directly to mailhandler.module. please commit to both 4.5 and HEAD if possible. each is slightly different, unfortunately ... from your description it sounds like we don't need that listhandler patch. it can be handled with a mailhandler command. see the extended help page for mailhandler.
Cheers, Mark
On 13 Mar 2005, at 13:20, Moshe Weitzman wrote:
At first blush, this looks like a real useful module for mail handling in Drupal. Nice work ... I wonder if it shouldn't go directly into mailhandler.module? Do you see other uses beyond prettying up inbound email? If you get the UTF8 stuff worked out, please close this issue: http://drupal.org/node/4758. Again, I think that fix should wind back into mailhandler.
CODE - you want to switch on $may_cache in emailfilter_menu(). your call to _head() is happenning twice right now. See docs for hook_menu().
On Mar 13, 2005, at 7:05 AM, Mark Leicester wrote:
Hi all,
I've just committed an emailfilter module which interprets a node as a plain text email. Depending on its configuration, it will attempt to indent and colorise threaded conversations, convert URLs into anchors and remove any extra linebreaks. It will also convert &, <, > etc. into HTML entities.
You can see an example of its use here: http://www.planetcocoon.com/
I'm using Drupal to collect email traffic to the Apache Cocoon mailing lists in an attempt to render it more attractively.
On another related topic, does anyone know how to tweak the mailhandler and listhandler to deal with the non-ASCII characters (such as ø, etc.) that come through in the usernames, subject lines and message bodies? At the moment these characters are all converted to ?. I've tried some experiments with drupal_convert_to_utf8() but I've had no luck so far.
As this is my first committed module, I'd appreciate some review; any feedback that people may have to offer would be gratefully received.
Regards, Mark
On another related topic, does anyone know how to tweak the mailhandler and listhandler to deal with the non-ASCII characters (such as ø, etc.) that come through in the usernames, subject lines and message bodies? At the moment these characters are all converted to ?. I've tried some experiments with drupal_convert_to_utf8() but I've had no luck so far.
Message bodies should be easy to convert with drupal_convert_to_utf8(), provided they are transferred in 8-bit mode (Content-Transfer-Encoding), which is what the large majority of mail clients does today. You will need iconv/mbstring/recode support. I would very much advise against hacking in utf8_encode() as Tim Altman suggests, as this function can only handle ISO-8859-1 and not Windows-1252 (the Microsoft-specific variant of ISO-8859-1, with smart quotes, euro-sign, etc), which is used a lot. For subject lines and such, the situation is trickier, as a separate method of encoding these parameters is used. See RFC 2047: http://www.rfc-editor.org/rfc/rfc2047.txt We have a function mime_header_encode(), but no mime_header_decode(). As far as "characters are being converted to '?'" goes, is this is a real question mark or the replacement character U+FFFD (�)? Steven Wittens
Hi Steven, After some investigation I discovered the mailhandler was already using the complementary function to mime_header_encode(): imap_mime_header_decode(). I've altered mailhandler to call drupal_convert_to_utf8() with the charset that imap_mime_header_decode() returns. I have patched listhandler to do the same thing with the from address. Now international characters will be preserved in users' names created by listhandler. Another question. How should we handle a situation where a user has not compiled their php with iconv support? I made this mistake initially, and as a result drupal_convert_to_utf8() returned empty strings. drupal_convert_to_utf8() checks for the availability of several libraries, and returns nothing if none are available. I wonder if drupal_convert_to_utf8() shouldn't be patched to return the original string if no conversion library exists? Cheers, Mark On 13 Mar 2005, at 15:41, Steven Wittens wrote:
On another related topic, does anyone know how to tweak the mailhandler and listhandler to deal with the non-ASCII characters (such as ø, etc.) that come through in the usernames, subject lines and message bodies? At the moment these characters are all converted to ?. I've tried some experiments with drupal_convert_to_utf8() but I've had no luck so far.
Message bodies should be easy to convert with drupal_convert_to_utf8(), provided they are transferred in 8-bit mode (Content-Transfer-Encoding), which is what the large majority of mail clients does today. You will need iconv/mbstring/recode support. I would very much advise against hacking in utf8_encode() as Tim Altman suggests, as this function can only handle ISO-8859-1 and not Windows-1252 (the Microsoft-specific variant of ISO-8859-1, with smart quotes, euro-sign, etc), which is used a lot.
For subject lines and such, the situation is trickier, as a separate method of encoding these parameters is used. See RFC 2047: http://www.rfc-editor.org/rfc/rfc2047.txt
We have a function mime_header_encode(), but no mime_header_decode().
As far as "characters are being converted to '?'" goes, is this is a real question mark or the replacement character U+FFFD (�)?
Steven Wittens
Another question. How should we handle a situation where a user has not compiled their php with iconv support? I made this mistake initially, and as a result drupal_convert_to_utf8() returned empty strings. drupal_convert_to_utf8() checks for the availability of several libraries, and returns nothing if none are available. I wonder if drupal_convert_to_utf8() shouldn't be patched to return the original string if no conversion library exists?
Returning the original string means that garbage text is returned, or worse, that the new text is no longer valid UTF-8 (which breaks strict XML parsing for example). The correct behaviour is to not import data at all, as mixed encodings in one database (without a way to distiguish them) is very, very undesirable. For installs that don't have iconv, the admin will get a watchdog message when the function is called: Unsupported encoding '%s'. Please install iconv, GNU recode or mbstring for PHP. Steven Wittens
participants (6)
-
Gerhard Killesreiter -
Mark Leicester -
Moshe Weitzman -
Stefan Nagtegaal -
Steven Wittens -
Tim Altman