Hi Steven, After some investigation I discovered the mailhandler was already using the complementary function to mime_header_encode(): imap_mime_header_decode(). I've altered mailhandler to call drupal_convert_to_utf8() with the charset that imap_mime_header_decode() returns. I have patched listhandler to do the same thing with the from address. Now international characters will be preserved in users' names created by listhandler. Another question. How should we handle a situation where a user has not compiled their php with iconv support? I made this mistake initially, and as a result drupal_convert_to_utf8() returned empty strings. drupal_convert_to_utf8() checks for the availability of several libraries, and returns nothing if none are available. I wonder if drupal_convert_to_utf8() shouldn't be patched to return the original string if no conversion library exists? Cheers, Mark On 13 Mar 2005, at 15:41, Steven Wittens wrote:
On another related topic, does anyone know how to tweak the mailhandler and listhandler to deal with the non-ASCII characters (such as ø, etc.) that come through in the usernames, subject lines and message bodies? At the moment these characters are all converted to ?. I've tried some experiments with drupal_convert_to_utf8() but I've had no luck so far.
Message bodies should be easy to convert with drupal_convert_to_utf8(), provided they are transferred in 8-bit mode (Content-Transfer-Encoding), which is what the large majority of mail clients does today. You will need iconv/mbstring/recode support. I would very much advise against hacking in utf8_encode() as Tim Altman suggests, as this function can only handle ISO-8859-1 and not Windows-1252 (the Microsoft-specific variant of ISO-8859-1, with smart quotes, euro-sign, etc), which is used a lot.
For subject lines and such, the situation is trickier, as a separate method of encoding these parameters is used. See RFC 2047: http://www.rfc-editor.org/rfc/rfc2047.txt
We have a function mime_header_encode(), but no mime_header_decode().
As far as "characters are being converted to '?'" goes, is this is a real question mark or the replacement character U+FFFD (�)?
Steven Wittens