I think I've sent this link before, but just in case...
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) http://www.joelonsoftware.com/articles/Unicode.html
Hope it's helpful to someone.
Steven Wittens wrote:
Steve Dondley wrote:
I'm using mysql 3.23 and Drupal 4.5 on an Apache/Linux server, PHP version 4.10. My older version of MySQL stores all text as latin1, the equivalent of iso-8859-1 extended. But I notice that Drupal outputs pages using the utf-8 character set. This is causing problems with the extended iso-8859-1 characters (Micorsoft's curly quotes, etc.) and they usually show up as question marks in the text.
To solve the problem, I changed the charset argument in the drupal_set_header() function to iso-8859-1. This took care of the problem. But now, of course, any UTF-8 encoded text shows funky characters.
What's the best way to get Drupal to output both UTF-8 and iso-8859-1 extended characters properly?
Switching Drupal's encoding from UTF-8 to something else is not very advisable and further more it is unnecessary as Unicode includes any character in ISO-8859-1. For example, I bet your feed is broken now, as it is still saying it's encoded with UTF-8. You will experience similar problems with e-mails sent by Drupal for example.
A properly set up drupal site should handle any characters through UTF-8 and correctly convert everything into it. If you have old content, convert it to UTF-8 with iconv before importing. Oh and note that stuff like "curly quotes" is actually not ISO-8859-1, it's Windows-1252.
Steven Wittens