[development] 'name' column length in variables

Rob Thorne rob at torenware.com
Mon Mar 20 07:06:18 UTC 2006


I'm dubious this would help even in 4.1 and after...

This would only help us if MySQL stores UTF8 internally as UCS2 (i.e., 
two byte Unicode), or maybe UTF16 (i.e., 4 byte Unicode folded back to 2 
bytes), since that would guarantee that each and every character would 
have 2 bytes reserved for it.

I'm guessing, though, that when MySQL says the encoding in the DB is 
UTF-8, they mean that literally, and an Arabic glyph is going to take up 
3 bytes of storage space, and an ASCII character will take up only 1.  
This is instead of the one byte that an Arabic glyph takes up in 
ISO-8859-6.   So if anything, using UTF8 will make the problem worse.

The real solution, I think, is to make the field width wider for 4.8 for 
everything, so that there's enough space for non-Latin scripts like Arabic.

Rob

Alaa Abd El Fattah wrote:
> On Sun, 19 Mar 2006 14:08:06 -0500
> "Khalid B" <kb at 2bits.com> wrote:
>
>   
>>> I've hit that problem in several different places due to multibyte
>>> utf-8 chars
>>>
>>> what is sufficient as a node title or anonymous commenter name in
>>> latin letters is not enough at all when using arabic for instance,
>>> arabic names for cck types and fields will be more of a problem.
>>>       
>> I think this is no longer an issue. In HEAD, and a few betas for 4.7,
>> there is a change to force the charset for the tables to be utf-8.
>>     
>
> this wouldn't work with mysql 4.0 would it?
>
> cheers,
> Alaa
>   



More information about the development mailing list