Issue status update for http://drupal.org/node/19575 Project: Drupal Version: 4.5.2 Component: search.module Category: bug reports Priority: normal Assigned to: Anonymous Reported by: edhel Updated by: Steven Status: active It is the intention of this regular expression to be byte based, not character based. Is there a way to force us to use the plain version of this function? I suppose preg_replace is overridden as well? Stupid PHP... when will they learn that bytes are not the same as characters :( What happens when you try the following: <?php preg_replace("/[^\x80-\xF7 [:alnum:]@_.-]/", $name) ?> Steven Previous comments: ------------------------------------------------------------------------ March 28, 2005 - 06:45 : edhel Search in Drupal work with national non-english words only with such conditions: 1) php-module mbstring is switched on 2) php configured with mbstring.func_overload = 7 (or at least 2) 3) php configured with mbstring.internal_encoding "UTF-8" Cause: in search.module there is used strtolower() which incorrectly work UTF-8 (without configured mbstring). Options 2-3 can be configured in .htaccess. Also these options can be replaced by other solution: replacing of all strtolower() to mb_strtolower() in search.module file. This "feature" is also present in 4.6 RC1. IMHO it must be fixed in some way or DOCUMENTED IN INSTALL.TXT AND IN HANDBOOK (in "installation" section). Notice: with 2-3 options there will be happen some warnings in Drupal: 1) in includes/menu.inc:910 (see: http://drupal.org/node/11758) 2) in modules/user.module:214. For correct work I replaced: <?php if (ereg("[^\x80-\xF7 [:alnum:]@_.-]", $name)) return t('The username contains an illegal character.'); ?> by code: <?php if (ereg("[^\\x80-\\xF7 [:alnum:]@_.-]", $name)) return t('The username contains an illegal character.'); ?> ------------------------------------------------------------------------ March 28, 2005 - 17:59 : Steven The 4.6 search.module is significantly different from 4.5.x and supports UTF-8 even without mbstring (though you will get better results on non-latin languages with it). Marking as won'tfix, because this was the main point for this issue. I fixed the user notice.