[drupal-devel] Search.module and minimum version of PHP
Steven Wittens
steven at acko.net
Mon Mar 7 00:11:31 UTC 2005
Some of you may have noticed that the 4.6 RC announcements has some
comments about php version in it. The cause is the new search.module
which requires Unicode/UTF-8 support in the perl-compatible regular
expressions (PCRE) library. The PHP documentation says compatibility was
added to PCRE with PHP 4.1 on Unix and PHP 4.2.3 on Windows (and this is
why I thought it was reasonable to use it).
But it seems that the /real/ support wasn't added until PHP 4.3.3. I've
done some testing with Gerhard and the UTF-8 support in the PCRE in PHP
4.3.2 (or earlier) is pretty broken as far as I can tell on both Windows
and Linux
The reason behind all this is that the search now supports characters in
the entire Unicode range when it splits up text into words. This is in
fact important for every language, as more and more 'high' unicode
characters are used every day (anything outside ISO-8859-1/Latin-1, e.g;
smart/curly quotes, euro sign, math symbols, and any language that does
not use the accented latin script). In theory we could convert the
character-based regular expression into a byte-based one, but this would
require an insane amount of coding to do the conversion
programmatically. The result would be a truly monstrous regular
expression, so I really don't see it happening in practice.
Or we could write our own UTF-8 compatible tokenizer, but again this
would be a large piece of code that is slow to boot.
An alternative is to ignore high unicode characters in the searching.
This means that sites with western-european content will still be
indexed in a somewhat working fashion (just behave badly around curly
quotes and euro signs), but any other language will be broken. It is
certainly not something we can implement in Drupal core, but I can make
a patch which does this for those that are stuck on an old PHP install.
Or we could just say "search.module and thus Drupal requires PHP 4.3.3".
What do you guys think about it?
Steven
More information about the drupal-devel
mailing list