[drupal-devel] Search.module and minimum version of PHP

Steven Wittens steven at acko.net
Mon Mar 7 00:11:31 UTC 2005

Some of you may have noticed that the 4.6 RC announcements has some 
comments about php version in it. The cause is the new search.module 
which requires Unicode/UTF-8 support in the perl-compatible regular 
expressions (PCRE) library. The PHP documentation says compatibility was 
added to PCRE with PHP 4.1 on Unix and PHP 4.2.3 on Windows (and this is 
why I thought it was reasonable to use it).

But it seems that the /real/ support wasn't added until PHP 4.3.3. I've 
done some testing with Gerhard and the UTF-8 support in the PCRE in PHP 
4.3.2 (or earlier) is pretty broken as far as I can tell on both Windows 
and Linux

The reason behind all this is that the search now supports characters in 
the entire Unicode range when it splits up text into words. This is in 
fact important for every language, as more and more 'high' unicode 
characters are used every day (anything outside ISO-8859-1/Latin-1, e.g; 
smart/curly quotes, euro sign, math symbols, and any language that does 
not use the accented latin script). In theory we could convert the 
character-based regular expression into a byte-based one, but this would 
require an insane amount of coding to do the conversion 
programmatically. The result would be a truly monstrous regular 
expression, so I really don't see it happening in practice.

Or we could write our own UTF-8 compatible tokenizer, but again this 
would be a large piece of code that is slow to boot.

An alternative is to ignore high unicode characters in the searching. 
This means that sites with western-european content will still be 
indexed in a somewhat working fashion (just behave badly around curly 
quotes and euro signs), but any other language will be broken. It is 
certainly not something we can implement in Drupal core, but I can make 
a patch which does this for those that are stuck on an old PHP install.

Or we could just say "search.module and thus Drupal requires PHP 4.3.3".

What do you guys think about it?


More information about the drupal-devel mailing list