Some of you may have noticed that the 4.6 RC announcements has some comments about php version in it. The cause is the new search.module which requires Unicode/UTF-8 support in the perl-compatible regular expressions (PCRE) library. The PHP documentation says compatibility was added to PCRE with PHP 4.1 on Unix and PHP 4.2.3 on Windows (and this is why I thought it was reasonable to use it). But it seems that the /real/ support wasn't added until PHP 4.3.3. I've done some testing with Gerhard and the UTF-8 support in the PCRE in PHP 4.3.2 (or earlier) is pretty broken as far as I can tell on both Windows and Linux The reason behind all this is that the search now supports characters in the entire Unicode range when it splits up text into words. This is in fact important for every language, as more and more 'high' unicode characters are used every day (anything outside ISO-8859-1/Latin-1, e.g; smart/curly quotes, euro sign, math symbols, and any language that does not use the accented latin script). In theory we could convert the character-based regular expression into a byte-based one, but this would require an insane amount of coding to do the conversion programmatically. The result would be a truly monstrous regular expression, so I really don't see it happening in practice. Or we could write our own UTF-8 compatible tokenizer, but again this would be a large piece of code that is slow to boot. An alternative is to ignore high unicode characters in the searching. This means that sites with western-european content will still be indexed in a somewhat working fashion (just behave badly around curly quotes and euro signs), but any other language will be broken. It is certainly not something we can implement in Drupal core, but I can make a patch which does this for those that are stuck on an old PHP install. Or we could just say "search.module and thus Drupal requires PHP 4.3.3". What do you guys think about it? Steven