Piotr Szotkowski wrote:
Hello.
My name is Piotr Szotkowski and I'm a developer responsible for internationalisation, and, thus, UTF-8 support in the CiviCRM module.
One of the problems I've encountered is that we'd like PHP to overload the string manipulation functions with their multibyte counterparts (as per [1]). Unfortunately, the overloading is possible only at the directory level, which means we'd have to overload all of the Drupal's calls, not only CiviCRM's ones.
After adding
php_value mbstring.func_overload 7 php_value mbstring.internal_encoding UTF-8
Actually you will run into some more problems. Drupal is designed to not use mbstring, because it is not available everywhere. We have our own functions for handling basic UTF-8 stuff (string truncation, mime header encode, etc). These functions assume they get direct access to the string's bytes. I have no idea how thorough mbstring override is, but you will certainly run into problems. Perhaps the best solution is to make Drupal explicitly check for mbstring and use it if present, otherwise use its own routines. truncate_utf8() is an excellent example of this. I believe that with the above PHP settings, it will actually perform excessive truncation, where the unicode character codepoints are treated with their meaning as UTF-8 bytes. Furthermore, if it counts characters, not bytes, then there is no guarantee that the returned string is short enough. Steven