[drupal-devel] [bug] search.module doesn't search non-english words without mbstring php-module

Steven drupal-devel at drupal.org
Mon Mar 28 17:06:51 UTC 2005


Issue status update for http://drupal.org/node/19575

 Project:      Drupal
 Version:      4.5.2
 Component:    search.module
 Category:     bug reports
 Priority:     normal
 Assigned to:  Anonymous
 Reported by:  edhel
 Updated by:   Steven
 Status:       active

It is the intention of this regular expression to be byte based, not
character based. Is there a way to force us to use the plain version of
this function?
I suppose preg_replace is overridden as well? Stupid PHP... when will
they learn that bytes are not the same as characters :(
What happens when you try the following:

<?php
preg_replace("/[^\x80-\xF7 [:alnum:]@_.-]/", $name)
?>




Steven



Previous comments:
------------------------------------------------------------------------

March 28, 2005 - 06:45 : edhel

Search in Drupal work with national non-english words only with such
conditions:
1) php-module mbstring is switched on
2) php configured with mbstring.func_overload = 7 (or at least 2)
3) php configured with mbstring.internal_encoding "UTF-8"
Cause: in search.module there is used strtolower() which incorrectly
work UTF-8 (without configured mbstring).
Options 2-3 can be configured in .htaccess. Also these options can be
replaced by other solution: replacing of all strtolower() to
mb_strtolower() in search.module file.
This "feature" is also present in 4.6 RC1.
IMHO it must be fixed in some way or DOCUMENTED IN INSTALL.TXT AND IN
HANDBOOK (in "installation" section).
Notice: with 2-3 options there will be happen some warnings in Drupal:
1) in includes/menu.inc:910 (see: http://drupal.org/node/11758)
2) in modules/user.module:214. For correct work I replaced:

<?php
if (ereg("[^\x80-\xF7 [:alnum:]@_.-]", $name)) return t('The username
contains an illegal character.');
?>


by code:

<?php
if (ereg("[^\\x80-\\xF7 [:alnum:]@_.-]", $name)) return t('The username
contains an illegal character.');
?>




------------------------------------------------------------------------

March 28, 2005 - 17:59 : Steven

The 4.6 search.module is significantly different from 4.5.x and supports
UTF-8 even without mbstring (though you will get better results on
non-latin languages with it). Marking as won'tfix, because this was the
main point for this issue.
I fixed the user notice.





More information about the drupal-devel mailing list