I updated the comment for that function recently... it is about bytes, not characters. It is a tough situation...not only can PHP support UTF-8, but also the MySQL database. And for most people, they have no control over the MySQL settings of their host. If MySQL is not configured for UTF-8, then strings will still work, but the column sizes will be in bytes. We assume the worst situation right now (no mbstring, no mysql utf-8). I think it is a reasonable expectation that hosts with mbstring also use UTF-8 for the database. People who don't have a utf-8 database will have to disable mbstring. We can provide a section for this in settings.php, and we can enable mbstring by default if present, and force the character set to UTF-8. So, for every "utf8" function, we make an mbstring version, which works on characters, and a non-mbstring version which works on bytes (but is UTF-8 aware). I'd like a global toggle between the two modes, set in settings.php. That way we can control encodings inside Drupal. We can provide specific UTF-8-aware APIs, and separate text processing from byte array processing. They are different in many ways, but PHP does not provide a distinction. Text processing in Drupal has been handicapped in Drupal ever since we went to UTF-8. This allows us to rectify things and reintroduce things like upper/lowercasing for all languages (needed for search). However, it is hard to use mbstring overload in Drupal. Drupal has to handle all sorts of encodings, and once overloaded it is impossible to call the original functions and you have to change the internal encoding back and from. It would also complicate the code a lot in ugly ways. For example, in drupal_xml_parser_create(), we use ereg() on a string in an as of yet unknown encoding. If the internal encoding is UTF-8, the operation cannot be performed because most likely the string is not valid UTF-8. We would need to switch to a byte-based encoding, like ISO-8859-1. There are also several places where it is important that we count in bytes, not characters. For example, the mime header encode: it has to be wrapped at no more than 80 bytes per line. If mbstring overload is enabled, we again need to switch the internal encoding to ensure we can count in bytes. That's why I'd say that we do not support mbstring overload in Drupal, but update our UTF-8 APIs so they can take advantage of mbstring if it is present. We cannot use the non-overloaded versions of strtolower() and friends, because they will mess up UTF-8. We can provide wrappers around strtolower(), ucfirst(), etc. If mbstring is present, they use that (and support all of Unicode), otherwise we use a poor man's ASCII version which leaves Unicode alone. I can provide optimized routines, I've written dozens of UTF-8 processors in PHP. We already have several utf8 utility functions which resemble mb_ api calls anyway (string_length, truncate_utf8, ...). I wrote some more in my recent access keys patch. Steven Wittens