[drupal-devel] PHP string functions overloading for multibyte support

Piotr Szotkowski shot at caltha.pl
Thu May 26 12:25:14 UTC 2005


Hello.

Steven Wittens:

> Actually you will run into some more problems. Drupal is designed to
> not use mbstring, because it is not available everywhere. We have our
> own functions for handling basic UTF-8 stuff (string truncation, mime
> header encode, etc). These functions assume they get direct access to
> the string's bytes.

I'll take a look at mime_header_encode() later (at last I'll have
a reason to throughly research how should non-ASCII characters be
encoded in MIME headers...), but am I right in thinking the below
should bulletproof the truncate_utf8()?

diff -ur drupal-4.6.0/includes/common.inc drupal/includes/common.inc
--- drupal-4.6.0/includes/common.inc    2005-04-12 00:50:41.000000000 +0200
+++ drupal/includes/common.inc  2005-05-26 14:08:03.994239172 +0200
@@ -1707,10 +1707,12 @@
   if ($wordsafe) {
     while (($string[--$len] != ' ') && ($len > 0)) {};
   }
-  if ((ord($string[$len]) < 0x80) || (ord($string[$len]) >= 0xC0)) {
-    return substr($string, 0, $len);
+  if (!(ini_get('mbstring.func_overload') & 2)) {
+    if ((ord($string[$len]) < 0x80) || (ord($string[$len]) >= 0xC0)) {
+      return substr($string, 0, $len);
+    }
+    while (ord($string[--$len]) < 0xC0) {};
   }
-  while (ord($string[--$len]) < 0xC0) {};
   return substr($string, 0, $len);
 }

I guess that the truncate_utf8() function is used (among other places)
for VARCHAR fields in MySQL. Does Drupal require MySQL 4.1 and assume
it's properly configured (so MySQL's VARCHAR length is in characters,
not bytes), or should this function assure that the return value is at
most $len *bytes* long?

> I have no idea how thorough mbstring override is,

http://www.php.net/manual/en/ref.mbstring.php#AEN80049 -
the throughness depends on the mbstring.func_overload setting.

> but you will certainly run into problems. Perhaps the best solution
> is to make Drupal explicitly check for mbstring and use it if present,
> otherwise use its own routines.

That's what I'm trying to do. I think patching Drupal (and having
the "classic" functions work as multibyte ones) is still better
than patching Smarty and all the other third-party packages to
use mb_substr() instead of substr(), etc. Of course I'm open to
any comments on this approach.

Cheers,
-- Shot
-- 
 I hate leaving Windows 95 boxes publically accessible, so shifting even to
 NT is a blessing in some ways. At least I can reboot them remotely in a sane
 manner, rather than having to send them malformed packets.   -- _BOFHJournal_
====================== http://shot.pl/hovercraft/ === http://shot.pl/1/125/ ===



More information about the drupal-devel mailing list