Hello. Steven Wittens:
Actually you will run into some more problems. Drupal is designed to not use mbstring, because it is not available everywhere. We have our own functions for handling basic UTF-8 stuff (string truncation, mime header encode, etc). These functions assume they get direct access to the string's bytes.
I'll take a look at mime_header_encode() later (at last I'll have a reason to throughly research how should non-ASCII characters be encoded in MIME headers...), but am I right in thinking the below should bulletproof the truncate_utf8()? diff -ur drupal-4.6.0/includes/common.inc drupal/includes/common.inc --- drupal-4.6.0/includes/common.inc 2005-04-12 00:50:41.000000000 +0200 +++ drupal/includes/common.inc 2005-05-26 14:08:03.994239172 +0200 @@ -1707,10 +1707,12 @@ if ($wordsafe) { while (($string[--$len] != ' ') && ($len > 0)) {}; } - if ((ord($string[$len]) < 0x80) || (ord($string[$len]) >= 0xC0)) { - return substr($string, 0, $len); + if (!(ini_get('mbstring.func_overload') & 2)) { + if ((ord($string[$len]) < 0x80) || (ord($string[$len]) >= 0xC0)) { + return substr($string, 0, $len); + } + while (ord($string[--$len]) < 0xC0) {}; } - while (ord($string[--$len]) < 0xC0) {}; return substr($string, 0, $len); } I guess that the truncate_utf8() function is used (among other places) for VARCHAR fields in MySQL. Does Drupal require MySQL 4.1 and assume it's properly configured (so MySQL's VARCHAR length is in characters, not bytes), or should this function assure that the return value is at most $len *bytes* long?
I have no idea how thorough mbstring override is,
http://www.php.net/manual/en/ref.mbstring.php#AEN80049 - the throughness depends on the mbstring.func_overload setting.
but you will certainly run into problems. Perhaps the best solution is to make Drupal explicitly check for mbstring and use it if present, otherwise use its own routines.
That's what I'm trying to do. I think patching Drupal (and having the "classic" functions work as multibyte ones) is still better than patching Smarty and all the other third-party packages to use mb_substr() instead of substr(), etc. Of course I'm open to any comments on this approach. Cheers, -- Shot -- I hate leaving Windows 95 boxes publically accessible, so shifting even to NT is a blessing in some ways. At least I can reboot them remotely in a sane manner, rather than having to send them malformed packets. -- _BOFHJournal_ ====================== http://shot.pl/hovercraft/ === http://shot.pl/1/125/ ===