I'm using mysql 3.23 and Drupal 4.5 on an Apache/Linux server, PHP version 4.10. My older version of MySQL stores all text as latin1, the equivalent of iso-8859-1 extended. But I notice that Drupal outputs pages using the utf-8 character set. This is causing problems with the extended iso-8859-1 characters (Micorsoft's curly quotes, etc.) and they usually show up as question marks in the text.
To solve the problem, I changed the charset argument in the drupal_set_header() function to iso-8859-1. This took care of the problem. But now, of course, any UTF-8 encoded text shows funky characters.
What's the best way to get Drupal to output both UTF-8 and iso-8859-1 extended characters properly?
On Mon, 2005-02-28 at 09:33 -0500, Steve Dondley wrote:
To solve the problem, I changed the charset argument in the drupal_set_header() function to iso-8859-1. This took care of the problem. But now, of course, any UTF-8 encoded text shows funky characters.
What's the best way to get Drupal to output both UTF-8 and iso-8859-1 extended characters properly?
probably the best php way is using iconv - it is available for windows & the *nix. I don't have a clue though how many of the hosting providers support this extension by default.
may be a wrapper function similar to t() and l() could do the job - something along the lines of if iconv is present the convert from source_encoding to source_encoding otherwise leave intact.
function to_utf8($text, $enc='ISO-8859-1') { if (function_exists('iconv')) { return iconv($enc,'UTF8'); } return $text; }
Vlado
function to_utf8($text, $enc='ISO-8859-1') { if (function_exists('iconv')) { return iconv($enc,'UTF8',$text); } return $text; }
ever so forgetful - added $text above
Steve Dondley wrote:
I'm using mysql 3.23 and Drupal 4.5 on an Apache/Linux server, PHP version 4.10. My older version of MySQL stores all text as latin1, the equivalent of iso-8859-1 extended. But I notice that Drupal outputs pages using the utf-8 character set. This is causing problems with the extended iso-8859-1 characters (Micorsoft's curly quotes, etc.) and they usually show up as question marks in the text.
To solve the problem, I changed the charset argument in the drupal_set_header() function to iso-8859-1. This took care of the problem. But now, of course, any UTF-8 encoded text shows funky characters.
What's the best way to get Drupal to output both UTF-8 and iso-8859-1 extended characters properly?
Switching Drupal's encoding from UTF-8 to something else is not very advisable and further more it is unnecessary as Unicode includes any character in ISO-8859-1. For example, I bet your feed is broken now, as it is still saying it's encoded with UTF-8. You will experience similar problems with e-mails sent by Drupal for example.
A properly set up drupal site should handle any characters through UTF-8 and correctly convert everything into it. If you have old content, convert it to UTF-8 with iconv before importing. Oh and note that stuff like "curly quotes" is actually not ISO-8859-1, it's Windows-1252.
Steven Wittens
I think I've sent this link before, but just in case...
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) http://www.joelonsoftware.com/articles/Unicode.html
Hope it's helpful to someone.
Steven Wittens wrote:
Steve Dondley wrote:
I'm using mysql 3.23 and Drupal 4.5 on an Apache/Linux server, PHP version 4.10. My older version of MySQL stores all text as latin1, the equivalent of iso-8859-1 extended. But I notice that Drupal outputs pages using the utf-8 character set. This is causing problems with the extended iso-8859-1 characters (Micorsoft's curly quotes, etc.) and they usually show up as question marks in the text.
To solve the problem, I changed the charset argument in the drupal_set_header() function to iso-8859-1. This took care of the problem. But now, of course, any UTF-8 encoded text shows funky characters.
What's the best way to get Drupal to output both UTF-8 and iso-8859-1 extended characters properly?
Switching Drupal's encoding from UTF-8 to something else is not very advisable and further more it is unnecessary as Unicode includes any character in ISO-8859-1. For example, I bet your feed is broken now, as it is still saying it's encoded with UTF-8. You will experience similar problems with e-mails sent by Drupal for example.
A properly set up drupal site should handle any characters through UTF-8 and correctly convert everything into it. If you have old content, convert it to UTF-8 with iconv before importing. Oh and note that stuff like "curly quotes" is actually not ISO-8859-1, it's Windows-1252.
Steven Wittens
Switching Drupal's encoding from UTF-8 to something else is not very advisable and further more it is unnecessary as Unicode includes any character in ISO-8859-1. For example, I bet your feed is broken now, as it is still saying it's encoded with UTF-8. You will experience similar problems with e-mails sent by Drupal for example.
A properly set up drupal site should handle any characters through UTF-8 and correctly convert everything into it. If you have old content, convert it to UTF-8 with iconv before importing. Oh and note that stuff like "curly quotes" is actually not ISO-8859-1, it's Windows-1252.
Yes, I understand the distinction between ISO-8859-x and Windows-1252.
At any rate, the content is coming in from e-mails handled via the mailhandler modules (which are often written by people using MS applications). These are not converted to standard UTF-8 format as best as I can tell. I have already decided against changing the output format to ISO-8859-1 and changed it back into UTF-8. Instead, I'm in the middle of addressing this problem by inserting conversion functions into the mailhandler module. However, PHP needs to be compiled "with-iconv" in order for this to work. This can be a problem for some users.
But you say that Drupal already handles this conversion. I'm not so sure about that. I did a grep on 'iconv' and only place I saw it used was in an xml parser function in common.inc for the feeds.
On Tue, 1 Mar 2005, Steve Dondley wrote:
A properly set up drupal site should handle any characters through UTF-8 and correctly convert everything into it. If you have old content, convert it to UTF-8 with iconv before importing. Oh and note that stuff like "curly quotes" is actually not ISO-8859-1, it's Windows-1252.
Yes, I understand the distinction between ISO-8859-x and Windows-1252.
At any rate, the content is coming in from e-mails handled via the mailhandler modules (which are often written by people using MS applications). These are not converted to standard UTF-8 format as best as I can tell.
Indeed they aren't.
I have already decided against changing the output format to ISO-8859-1 and changed it back into UTF-8. Instead, I'm in the middle of addressing this problem by inserting conversion functions into the mailhandler module. However, PHP needs to be compiled "with-iconv" in order for this to work. This can be a problem for some users.
But you say that Drupal already handles this conversion. I'm not so sure about that. I did a grep on 'iconv' and only place I saw it used was in an xml parser function in common.inc for the feeds.
Right. It would help if that part of the function was in a separate function. Then mailhandler could call this function.
Cheers, Gerhard