[drupal-support] Charset problem

Steve Dondley stevedondley at comcast.net
Wed Mar 2 00:55:53 UTC 2005


> Switching Drupal's encoding from UTF-8 to something else is not very 
> advisable and further more it is unnecessary as Unicode includes any 
> character in ISO-8859-1.  For example, I bet your feed is broken now, 
> as it is still saying it's encoded with UTF-8. You will experience 
> similar problems with e-mails sent by Drupal for example.
>
> A properly set up drupal site should handle any characters through 
> UTF-8 and correctly convert everything into it. If you have old 
> content, convert it to UTF-8 with iconv before importing. Oh and note 
> that stuff like "curly quotes" is actually not ISO-8859-1, it's 
> Windows-1252.
>
Yes, I understand the distinction between ISO-8859-x and Windows-1252.

At any rate, the content is coming in from e-mails handled via the 
mailhandler modules (which are often written by people using MS 
applications).  These are not converted to standard UTF-8 format as best 
as I can tell.  I have already decided against changing the output 
format to ISO-8859-1 and changed it back into UTF-8.  Instead, I'm in 
the middle of addressing this problem by inserting conversion functions 
into the mailhandler module.  However, PHP needs to be compiled 
"with-iconv" in order for this to work.  This can be a problem for some 
users.

But you say that Drupal already handles this conversion.  I'm not so 
sure about that.  I did a grep on 'iconv' and only place I saw it used 
was in an xml parser function in common.inc for the feeds.



More information about the drupal-support mailing list