[drupal-devel] Crucial problem with mysql 4.1.x and collation

Abalieno abalieno at cesspit.net
Tue Sep 6 22:50:00 UTC 2005

This is a rather serious problem, from my point of view, that I foresee 
getting rather widespread if not solved quickly. After a few hours of 
research and an headache here's what I discovered:

- Mysql 4.1.x adds the possibility to set a collation for the database. This 
seems a new feature that wasn't there before.

- By default it seems that every database created or imported is 
automatically set to "latin1_swedish_ci". The whole database is set with 
that collation as you install drupal under that version of mysql or import a 
previous dump.

- This is causing a serious corruption in the database while exporting it 
because accented and other utf-8 characters are just NOT COMPATIBLE with the 
latin1_swedish_ci set.

- This means that if you install drupal on mysql 4.1.x, the very first time 
you export the database for a backup or whetever else, you'll finish with a 
corrupted dump because all the accented characters in the nodes, comments 
and aggregator items will get replaced with GARBAGE. As -> 
"Saturday’s Teen People" in the place of "Saturday ’s Teen 
People" This is taken from my now broken database and since I noticed this 
too late I now cannot do anything if not manually change every single entry. 
How fun.

- In the handbook, install.txt and all the other install guides for Drupal 
THERE IS NO MENTION of the collation. This means that it's written nowhere 
how to set the collation and so everyone just follows the standard 
instructions and finishes with a "latin1_swedish_ci" as it happened to me. 
Including the text in the aggregator items, node and comment bodies.

- How the hell the DB must be configured now? Because from what I read here: 
http://drupal.org/node/15746#comment-36443 It's not even possible to set 
Drupal to use utf8 because it's still not compliant.

So, beside having my database now unrecoverable, how should I set it to have 
it working properly from now on and be able to back up it without getting 
unrecoverable garbage text?

Then I seriously suggest you to patch the guides and the drupal package to 
stop this or it would become a rather large problem considering that 
following step by step the instruction you unavoidably go toward this 
corruption problem.

- HRose / Abalieno 

