[support] Indexing of really short words

Yannick Gingras yannick.gingras at savoirfairelinux.com
Thu Jan 10 15:32:13 UTC 2008


Greetings,  
  We have a Drupal site with ~600000 articles.  Indexing takes quite a
lot of time but what puzzles me is that MySQL slow query log shows:

  SELECT sum(score) FROM search_index WHERE word = 'S';
  
When I do

  mysqladmin processlist

I see many such queries taking a log of time on two character words.
My configuration is set to index only three character words.  Why do I
see the queries with two character words?  I can't recall having ever
set the word length parameter to two but could it be that my
search_index is polluted with short words?  Is it safe to simply
delete the short words from the index?

Furthermore, I see many slow queries on generic words like "and",
"not", "now", and the like.  Is there a way to tell Drupal not to
index those?

Any other advices that would help me speed up indexing will be well
appreciated.  At the moment, indexing is one of our major bottle neck.

Best regards, 

-- 
Yannick Gingras
Consultant GNU/Linux et logiciel libre
Savoir-faire Linux



More information about the support mailing list