[drupal-devel] multibyte/mbstring in Drupal

Piotr Szotkowski shot at caltha.pl
Fri May 27 07:52:14 UTC 2005


Hello.

Steven Wittens:

> I updated the comment for that function recently... it is about bytes,
> not characters. It is a tough situation...not only can PHP support
> UTF-8, but also the MySQL database. And for most people, they have
> no control over the MySQL settings of their host.

Just a side note: They don't have to. The database character set can be
set at database creation (and this is done by Drupal), and everything
else (client, connection and results character sets) can be set on
runtime with the SQL query SET NAMES 'utf8' (best run just after
establishing the connection, of course).

> However, it is hard to use mbstring overload in Drupal. Drupal has
> to handle all sorts of encodings, and once overloaded it is impossible
> to call the original functions and you have to change the internal
> encoding back and from.

It's possible with some of them, you just call them with ISO-8859-1:

if ($overloaded)
  $byteLength = strlen($string, 'ISO-8859-1');
else
  $byteLength = strlen($string);

> It would also complicate the code a lot in ugly ways. For example, in
> drupal_xml_parser_create(), we use ereg() on a string in an as of yet
> unknown encoding. If the internal encoding is UTF-8, the operation
> cannot be performed because most likely the string is not valid UTF-8.
> We would need to switch to a byte-based encoding, like ISO-8859-1.

if ($overloaded) {
  mb_regex_encoding('ISO-8859-1');
  $result = ereg($pattern, $string);
  mb_regex_encoding('UTF-8');
}
else {
  $result = ereg($pattern, $string);
}

But I agree, it's uglier than the previous example - although most
probably mb_regex_encoding could simply be set once for the whole
drupal_xml_parser_create.

I'm not trying to argue here; if you decide on a "no overloading" policy
in the end, then so be it - it's just that all of the problems raised
so far seem solvable. :o)

The perspective of having the string functions overloaded is very
tempting, as otherwise we'll have to make our own wrappers/overloads
for Smarty functions, and for any other pieces of third-party packages
we use. String functions overloading would take care of everything at
once and should be relatively future-proof, while each new versions of
Smarty (and other packages) could require rewriting of the fixes.

Cheers,
-- Shot
-- 
  We're the technical experts. We were hired so that management could ignore
  our recommendations and tell us how to do our jobs.        -- Mike Andrews
====================== http://shot.pl/hovercraft/ === http://shot.pl/1/125/ ===



More information about the drupal-devel mailing list