[drupal-devel] PHP string functions overloading for multibyte
support
Steven Wittens
steven at acko.net
Wed May 25 22:51:13 UTC 2005
Piotr Szotkowski wrote:
>Hello.
>
>My name is Piotr Szotkowski and I'm a developer responsible for
>internationalisation, and, thus, UTF-8 support in the CiviCRM module.
>
>One of the problems I've encountered is that we'd like PHP to overload
>the string manipulation functions with their multibyte counterparts
>(as per [1]). Unfortunately, the overloading is possible only at the
>directory level, which means we'd have to overload all of the Drupal's
>calls, not only CiviCRM's ones.
>
>After adding
>
>php_value mbstring.func_overload 7
>php_value mbstring.internal_encoding UTF-8
>
>
>
Actually you will run into some more problems. Drupal is designed to not
use mbstring, because it is not available everywhere. We have our own
functions for handling basic UTF-8 stuff (string truncation, mime header
encode, etc). These functions assume they get direct access to the
string's bytes.
I have no idea how thorough mbstring override is, but you will certainly
run into problems. Perhaps the best solution is to make Drupal
explicitly check for mbstring and use it if present, otherwise use its
own routines.
truncate_utf8() is an excellent example of this. I believe that with the
above PHP settings, it will actually perform excessive truncation, where
the unicode character codepoints are treated with their meaning as UTF-8
bytes. Furthermore, if it counts characters, not bytes, then there is no
guarantee that the returned string is short enough.
Steven
More information about the drupal-devel
mailing list