[development] ALL Contrib maintainers: UTF-8 update
Steven Wittens
steven at acko.net
Sat Jan 21 19:54:44 UTC 2006
>In order to solve this, you either have to run the update for the
>specific module, or you can do it manually for every table like so:
>
>ALTER TABLE vocabulary_node_types CONVERT TO CHARACTER SET utf8;
>
>
As I explained in the original issue, do /not/ run a query like this.
This would do an actual character set conversion, e.g. from Latin1 to UTF-8.
Drupal has already been using UTF-8 data, we simply didn't tell MySQL
about it (it thought it was e.g. Latin 1 data).
Example: You had the character 'é' (Unicode U+E9) in a node. Its UTF-8
representation is (in hexadecimal bytes) "C3 A9". If the database
character set was Latin1, then MySQL thought this was 2 characters,
because in Latin1 each byte is one character.
So, if you do a character set conversion, you would end up with the
UTF-8 encoding for character C3 followed by the UTF-8 encoding for
character A9, which is "C3 83 C2 A9".
What we want MySQL instead to do is to realize the bytestream is already
UTF-8 and see that the byte sequence "C3 A9" is a single character.
The only way to do is to convert all columns to a binary type and then
to UTF-8. This means no actual conversion is done, only re-interpretation:
See: http://dev.mysql.com/doc/refman/4.1/en/charset-conversion.html
Steven
More information about the development
mailing list