[drupal-devel] [bug] search.module doesn't search non-english words
without mbstring php-module
Steven
drupal-devel at drupal.org
Mon Mar 28 17:06:51 UTC 2005
Issue status update for http://drupal.org/node/19575
Project: Drupal
Version: 4.5.2
Component: search.module
Category: bug reports
Priority: normal
Assigned to: Anonymous
Reported by: edhel
Updated by: Steven
Status: active
It is the intention of this regular expression to be byte based, not
character based. Is there a way to force us to use the plain version of
this function?
I suppose preg_replace is overridden as well? Stupid PHP... when will
they learn that bytes are not the same as characters :(
What happens when you try the following:
<?php
preg_replace("/[^\x80-\xF7 [:alnum:]@_.-]/", $name)
?>
Steven
Previous comments:
------------------------------------------------------------------------
March 28, 2005 - 06:45 : edhel
Search in Drupal work with national non-english words only with such
conditions:
1) php-module mbstring is switched on
2) php configured with mbstring.func_overload = 7 (or at least 2)
3) php configured with mbstring.internal_encoding "UTF-8"
Cause: in search.module there is used strtolower() which incorrectly
work UTF-8 (without configured mbstring).
Options 2-3 can be configured in .htaccess. Also these options can be
replaced by other solution: replacing of all strtolower() to
mb_strtolower() in search.module file.
This "feature" is also present in 4.6 RC1.
IMHO it must be fixed in some way or DOCUMENTED IN INSTALL.TXT AND IN
HANDBOOK (in "installation" section).
Notice: with 2-3 options there will be happen some warnings in Drupal:
1) in includes/menu.inc:910 (see: http://drupal.org/node/11758)
2) in modules/user.module:214. For correct work I replaced:
<?php
if (ereg("[^\x80-\xF7 [:alnum:]@_.-]", $name)) return t('The username
contains an illegal character.');
?>
by code:
<?php
if (ereg("[^\\x80-\\xF7 [:alnum:]@_.-]", $name)) return t('The username
contains an illegal character.');
?>
------------------------------------------------------------------------
March 28, 2005 - 17:59 : Steven
The 4.6 search.module is significantly different from 4.5.x and supports
UTF-8 even without mbstring (though you will get better results on
non-latin languages with it). Marking as won'tfix, because this was the
main point for this issue.
I fixed the user notice.
More information about the drupal-devel
mailing list