Hi Alan, Is there any chance it's to do with this: http://drupal.org/node/90128 ? Hope this helps! Mark On 3 Apr 2007, at 15:58, Alan Dixon wrote:
I think i've just figured out a problem with a site I'm working on and wanted some wisdom from ths list. The site is http://community.telecentre.org/ (not that it matters).
The problem:
The problem was that the search module's database stopped getting updated (i.e., new material wasn't showing up in searches). I looked at the search_dataset table and discovered that the biggest nid (i.e., sid) was from a node that was published about 9 months ago (hmm, seems like most folks don't believe a site search anyway?).
The diagnosis:
So, I ran some debugging and discovered that the sql in node_update_index (the one that tells search whether there are any new nodes to spider) was returning no rows all the time, even though there was lots of new content. After struggling with the logic in the SQL, I think I figured out that the problem was a single node which had gotten a date of May 2007 in the created field. I don't think that's normally a problem, but the node_update_shutdown function (which is invoked in case search gets aborted because it runs out of time) saves the system variables node_cron_last and node_cron_last_nid as the current node's created and nid values.
Conclusion: I think what happened was that the search indexer got aborted while processing a node with a future date. That inserted a future value into node_cron_last, which means that nodes don't get spidered again until that date.
Question: (multiple choice to make it easy ...)
1. is this a problem with the node_update_shutdown logic (or the point in node_update_index when the last_change global gets set for it)?
2. Or is it a bug in the aggregator2 module that creates nodes with 'created' set in the future?
3. Or have i misdiagnosed the problem?
4. All of the above ...
Comments:
I've heard of other mysterious search indexing failures like this. It took me quite a while to figure out what was going on - the logic in what nodes get spidered is pretty complex. Does anyone have any handy tools for such search problem diagnosis? Sounds like a useful addition to the devel module or as a separate one. Something that can explain how many and which nodes will be spidered by the next cron perhaps ...
-- Alan Dixon, Web Developer http://alan.g.dixon.googlepages.com/