[drupal-devel] Crucial problem with mysql 4.1.x and collation
This is a rather serious problem, from my point of view, that I foresee getting rather widespread if not solved quickly. After a few hours of research and an headache here's what I discovered: - Mysql 4.1.x adds the possibility to set a collation for the database. This seems a new feature that wasn't there before. - By default it seems that every database created or imported is automatically set to "latin1_swedish_ci". The whole database is set with that collation as you install drupal under that version of mysql or import a previous dump. - This is causing a serious corruption in the database while exporting it because accented and other utf-8 characters are just NOT COMPATIBLE with the latin1_swedish_ci set. - This means that if you install drupal on mysql 4.1.x, the very first time you export the database for a backup or whetever else, you'll finish with a corrupted dump because all the accented characters in the nodes, comments and aggregator items will get replaced with GARBAGE. As -> "Saturday’s Teen People" in the place of "Saturday ’s Teen People" This is taken from my now broken database and since I noticed this too late I now cannot do anything if not manually change every single entry. How fun. - In the handbook, install.txt and all the other install guides for Drupal THERE IS NO MENTION of the collation. This means that it's written nowhere how to set the collation and so everyone just follows the standard instructions and finishes with a "latin1_swedish_ci" as it happened to me. Including the text in the aggregator items, node and comment bodies. - How the hell the DB must be configured now? Because from what I read here: http://drupal.org/node/15746#comment-36443 It's not even possible to set Drupal to use utf8 because it's still not compliant. So, beside having my database now unrecoverable, how should I set it to have it working properly from now on and be able to back up it without getting unrecoverable garbage text? Then I seriously suggest you to patch the guides and the drupal package to stop this or it would become a rather large problem considering that following step by step the instruction you unavoidably go toward this corruption problem. - HRose / Abalieno
I did not face something similar, but not the same. My hosting account is at MySQL 4.0.25, while my development server is 4.1.11. When doing a dump from development and trying to load it on the hosting account, I got an error that the phrase DEFAULT CHARSET is not valid. So, I resorted to doing: gunzip -c dbdump.sql.gz|sed -e 's/DEFAULT CHARSET=latin1//'| mysql -uusername -ppassowrd dbname Hence removing that part upon loading. I did not noticed a problem with accented characters though. Maybe I should check closer. On 9/6/05, Abalieno <abalieno@cesspit.net> wrote:
This is a rather serious problem, from my point of view, that I foresee getting rather widespread if not solved quickly. After a few hours of research and an headache here's what I discovered:
- Mysql 4.1.x adds the possibility to set a collation for the database. This seems a new feature that wasn't there before.
- By default it seems that every database created or imported is automatically set to "latin1_swedish_ci". The whole database is set with that collation as you install drupal under that version of mysql or import a previous dump.
- This is causing a serious corruption in the database while exporting it because accented and other utf-8 characters are just NOT COMPATIBLE with the latin1_swedish_ci set.
- This means that if you install drupal on mysql 4.1.x, the very first time you export the database for a backup or whetever else, you'll finish with a corrupted dump because all the accented characters in the nodes, comments and aggregator items will get replaced with GARBAGE. As -> "Saturday’s Teen People" in the place of "Saturday 's Teen People" This is taken from my now broken database and since I noticed this too late I now cannot do anything if not manually change every single entry. How fun.
- In the handbook, install.txt and all the other install guides for Drupal THERE IS NO MENTION of the collation. This means that it's written nowhere how to set the collation and so everyone just follows the standard instructions and finishes with a "latin1_swedish_ci" as it happened to me. Including the text in the aggregator items, node and comment bodies.
- How the hell the DB must be configured now? Because from what I read here: http://drupal.org/node/15746#comment-36443 It's not even possible to set Drupal to use utf8 because it's still not compliant.
So, beside having my database now unrecoverable, how should I set it to have it working properly from now on and be able to back up it without getting unrecoverable garbage text?
Then I seriously suggest you to patch the guides and the drupal package to stop this or it would become a rather large problem considering that following step by step the instruction you unavoidably go toward this corruption problem.
- HRose / Abalieno
I did not noticed a problem with accented characters though. Maybe I should check closer.
I managed to find a workaround but it should be printed IN BOLD on the manual. By default not only Drupal (mysql) sets itself as latin1 but the connection collation is set to utf8_general. This causes the conflict. If through phpmyadmin you set even the collation connection to latin1 the DB doesn't come out corrupted in the dumps. But in my case it's too late since I don't have recent backups taken before the corruption. -Abalieno
I did not face something similar, but not the same. My hosting account is at MySQL 4.0.25, while my development server is 4.1.11.
When doing a dump from development and trying to load it on the hosting account, I got an error that the phrase DEFAULT CHARSET is not valid.
So, I resorted to doing:
gunzip -c dbdump.sql.gz|sed -e 's/DEFAULT CHARSET=latin1//'| mysql -uusername -ppassowrd dbname
Hence removing that part upon loading.
I did not noticed a problem with accented characters though. Maybe I should check closer.
Well, since MySQL 4.0.x supports latin1 quite fine, there should not be any problem. Goba
participants (3)
-
Abalieno -
Gabor Hojtsy -
Khalid B