[drupal-devel] [bug] wrong search indexing in some cases

robertgarrigos drupal-devel at drupal.org
Mon Aug 22 14:52:30 UTC 2005


Issue status update for 
http://drupal.org/node/25923
Post a follow up: 
http://drupal.org/project/comments/add/25923

 Project:      Drupal
 Version:      4.5.5
 Component:    search.module
 Category:     bug reports
 Priority:     normal
 Assigned to:  robertgarrigos
 Reported by:  robertgarrigos
 Updated by:   robertgarrigos
 Status:       patch (ready to be committed)

You might be right. I was keeping one of my sites on 4.5.5 because I
missunderstud the system requirements: drupal 4.6.x can in fact run on
php4 (!).


However, if you keep updating 4.5.x due to security wholes for people
runing < php 4.3.3 I think it pays to debug that crappy search module
;-)




robertgarrigos



Previous comments:
------------------------------------------------------------------------

Tue, 28 Jun 2005 15:16:10 +0000 : robertgarrigos

In some cases, search module doesn't index some words, for instance,
when there are only tags between words. In that case they are indexed
all together:


This is part of a real node text in one of my web pages (in catalan):


1732/1735<br\><b>Instrumentació:</b>


this got indexed like this in the search_index table:


17321735instrumentació        169        1


which means I couldn't get a search result over 'instrumentació'


I fixed that by adding a white space into the code of search.moulde
file:


original file (lines 253-254):
      // Strip heaps of stuff out of it.
      $wordlist = preg_replace("'<[\/\!]*?[^<>]*?>'si", '',
$wordlist);


fixed file (lines 253-254):
      // Strip heaps of stuff out of it.
      $wordlist = preg_replace("'<[\/\!]*?[^<>]*?>'si", ' ',
$wordlist);




------------------------------------------------------------------------

Wed, 29 Jun 2005 21:12:59 +0000 : benshell

Have you tried this on 4.6.x?  I read this issue because I'm also having
search indexing problems, but this particular problem looks like it has
been fixed on 4.6.1.  On line 344 on the search.module, I'm reading
this:


  // Strip off all ignored tags to speed up processing, but insert
space before/after
  // them to keep word boundaries.
  $text = str_replace(array('<', '>'), array(' <', '> '), $text);
  $text = strip_tags($text, '<'. implode('><', array_keys($tags))
.'>');




------------------------------------------------------------------------

Thu, 30 Jun 2005 06:02:29 +0000 : robertgarrigos

No, I haven't. The web page I was having this problem is on a shared
server running php 4, thus no way to get drupal 4.6 on it.




------------------------------------------------------------------------

Mon, 22 Aug 2005 00:08:03 +0000 : robertgarrigos

This is not yet fixed with 4.5.5. Apparently there is no problem with
4.6.x versions.




------------------------------------------------------------------------

Mon, 22 Aug 2005 01:29:35 +0000 : robertgarrigos

Attachment: http://drupal.org/files/issues/search.module_0.patch (607 bytes)

I enclose a patch for this.


Please, forgive me if this is not the right way of doing. It's the
first time I'm using cvs with my macosx. Also the first time I'm using
diff to get a patch file, so take it as a simple "hello world" patch
file, which should work  and fix the problem anyway.




------------------------------------------------------------------------

Mon, 22 Aug 2005 08:02:01 +0000 : Dries

Robert: your patch looks OK, but I'll let Steven (UnConeD) review it.




------------------------------------------------------------------------

Mon, 22 Aug 2005 11:13:37 +0000 : Steven

This patch fixes the described issue, but do we want to be bothered
maintaining 4.5 search? It's pretty darn crappy.







More information about the drupal-devel mailing list