[development] mysterious search issue

Mark Leicester mark.leicester at efurbishment.com
Tue Apr 3 15:12:16 UTC 2007


Hi Alan,
Is there any chance it's to do with this: http://drupal.org/node/90128 ?
Hope this helps!
Mark

On 3 Apr 2007, at 15:58, Alan Dixon wrote:

> I think i've just figured out a problem with a site I'm working on and
> wanted some wisdom from ths list. The site is
> http://community.telecentre.org/ (not that it matters).
>
> The problem:
>
> The problem was that the search module's database stopped getting
> updated (i.e., new material wasn't showing up in searches). I looked
> at the search_dataset table and discovered that the biggest nid (i.e.,
> sid) was from a node that was published about 9 months ago (hmm, seems
> like most folks don't believe a site search anyway?).
>
> The diagnosis:
>
> So, I ran some debugging and discovered that the sql in
> node_update_index (the one that tells search whether there are any new
> nodes to spider) was returning no rows all the time, even though there
> was lots of new content. After struggling with the logic in the SQL, I
> think I figured out that the problem was a single node which had
> gotten a date of May 2007 in the created field. I don't think that's
> normally a problem, but the node_update_shutdown function (which is
> invoked in case search gets aborted because it runs out of time) saves
> the system variables node_cron_last and node_cron_last_nid as the
> current node's created and nid values.
>
> Conclusion: I think what happened was that the search indexer got
> aborted while processing a node with a future date. That inserted a
> future value into node_cron_last, which means that nodes don't get
> spidered again until that date.
>
> Question: (multiple choice to make it easy ...)
>
> 1. is this a problem with the node_update_shutdown logic (or the point
> in node_update_index when the last_change global gets set for it)?
>
> 2. Or is it a bug in the aggregator2 module that creates nodes with
> 'created' set in the future?
>
> 3. Or have i misdiagnosed the problem?
>
> 4. All of the above ...
>
> Comments:
>
> I've heard of other mysterious search indexing failures like this. It
> took me quite a while to figure out what was going on - the logic in
> what nodes get spidered is pretty complex. Does anyone have any handy
> tools for such search problem diagnosis? Sounds like a useful addition
> to the devel module or as a separate one. Something that can explain
> how many and which nodes will be spidered by the next cron perhaps ...
>
>
>
> -- 
> Alan Dixon, Web Developer
> http://alan.g.dixon.googlepages.com/



More information about the development mailing list