[development] mysterious search issue

Robert Douglass rob at robshouse.net
Tue Apr 3 15:10:17 UTC 2007


Alan,

nice report. Is there an issue in the issue queue for this? It seems a 
simple matter to add a clause to the query to ignore future nodes, and 
this isn't the first time I've heard of the problem. I'd be happy to 
work on this if nobody already is. Point me to an existing issue or one 
that you create. Thanks.

Robert Douglass

Alan Dixon wrote:
> I think i've just figured out a problem with a site I'm working on and
> wanted some wisdom from ths list. The site is
> http://community.telecentre.org/ (not that it matters).
>
> The problem:
>
> The problem was that the search module's database stopped getting
> updated (i.e., new material wasn't showing up in searches). I looked
> at the search_dataset table and discovered that the biggest nid (i.e.,
> sid) was from a node that was published about 9 months ago (hmm, seems
> like most folks don't believe a site search anyway?).
>
> The diagnosis:
>
> So, I ran some debugging and discovered that the sql in
> node_update_index (the one that tells search whether there are any new
> nodes to spider) was returning no rows all the time, even though there
> was lots of new content. After struggling with the logic in the SQL, I
> think I figured out that the problem was a single node which had
> gotten a date of May 2007 in the created field. I don't think that's
> normally a problem, but the node_update_shutdown function (which is
> invoked in case search gets aborted because it runs out of time) saves
> the system variables node_cron_last and node_cron_last_nid as the
> current node's created and nid values.
>
> Conclusion: I think what happened was that the search indexer got
> aborted while processing a node with a future date. That inserted a
> future value into node_cron_last, which means that nodes don't get
> spidered again until that date.
>
> Question: (multiple choice to make it easy ...)
>
> 1. is this a problem with the node_update_shutdown logic (or the point
> in node_update_index when the last_change global gets set for it)?
>
> 2. Or is it a bug in the aggregator2 module that creates nodes with
> 'created' set in the future?
>
> 3. Or have i misdiagnosed the problem?
>
> 4. All of the above ...
>
> Comments:
>
> I've heard of other mysterious search indexing failures like this. It
> took me quite a while to figure out what was going on - the logic in
> what nodes get spidered is pretty complex. Does anyone have any handy
> tools for such search problem diagnosis? Sounds like a useful addition
> to the devel module or as a separate one. Something that can explain
> how many and which nodes will be spidered by the next cron perhaps ...
>
>
>



More information about the development mailing list