I'm dubious this would help even in 4.1 and after... This would only help us if MySQL stores UTF8 internally as UCS2 (i.e., two byte Unicode), or maybe UTF16 (i.e., 4 byte Unicode folded back to 2 bytes), since that would guarantee that each and every character would have 2 bytes reserved for it. I'm guessing, though, that when MySQL says the encoding in the DB is UTF-8, they mean that literally, and an Arabic glyph is going to take up 3 bytes of storage space, and an ASCII character will take up only 1. This is instead of the one byte that an Arabic glyph takes up in ISO-8859-6. So if anything, using UTF8 will make the problem worse. The real solution, I think, is to make the field width wider for 4.8 for everything, so that there's enough space for non-Latin scripts like Arabic. Rob Alaa Abd El Fattah wrote:
On Sun, 19 Mar 2006 14:08:06 -0500 "Khalid B" <kb@2bits.com> wrote:
I've hit that problem in several different places due to multibyte utf-8 chars
what is sufficient as a node title or anonymous commenter name in latin letters is not enough at all when using arabic for instance, arabic names for cck types and fields will be more of a problem.
I think this is no longer an issue. In HEAD, and a few betas for 4.7, there is a change to force the charset for the tables to be utf-8.
this wouldn't work with mysql 4.0 would it?
cheers, Alaa