[drupal-devel] PHP string functions overloading for multibyte support

Steven Wittens steven at acko.net
Wed May 25 22:51:13 UTC 2005


Piotr Szotkowski wrote:

>Hello.
>
>My name is Piotr Szotkowski and I'm a developer responsible for
>internationalisation, and, thus, UTF-8 support in the CiviCRM module.
>
>One of the problems I've encountered is that we'd like PHP to overload
>the string manipulation functions with their multibyte counterparts
>(as per [1]). Unfortunately, the overloading is possible only at the
>directory level, which means we'd have to overload all of the Drupal's
>calls, not only CiviCRM's ones.
>
>After adding
>
>php_value mbstring.func_overload 7
>php_value mbstring.internal_encoding UTF-8
>
>  
>
Actually you will run into some more problems. Drupal is designed to not 
use mbstring, because it is not available everywhere. We have our own 
functions for handling basic UTF-8 stuff (string truncation, mime header 
encode, etc). These functions assume they get direct access to the 
string's bytes.

I have no idea how thorough mbstring override is, but you will certainly 
run into problems. Perhaps the best solution is to make Drupal 
explicitly check for mbstring and use it if present, otherwise use its 
own routines.

truncate_utf8() is an excellent example of this. I believe that with the 
above PHP settings, it will actually perform excessive truncation, where 
the unicode character codepoints are treated with their meaning as UTF-8 
bytes. Furthermore, if it counts characters, not bytes, then there is no 
guarantee that the returned string is short enough.

Steven



More information about the drupal-devel mailing list