[development] mysterious search issue

Alan Dixon alan.g.dixon at gmail.com
Tue Apr 3 15:32:55 UTC 2007


Robert:

Yes, I'll be happy to make an issue - I was hesitating because I
wasn't sure where the issue was (search? node? aggregator2?). I've
just gone and looked at it again and I realise it's slightly more
complicated. The sql I referred to in node_update_index is this:

$result = db_query_range('SELECT GREATEST(IF(c.last_comment_timestamp
IS NULL, 0, c.last_comment_timestamp), n.changed) as last_change,
n.nid FROM {node} n LEFT JOIN {node_comment_statistics} c ON n.nid =
c.nid WHERE n.status = 1 AND ((GREATEST(n.changed,
c.last_comment_timestamp) = %d AND n.nid > %d) OR (n.changed > %d OR
c.last_comment_timestamp > %d)) ORDER BY GREATEST(n.changed,
c.last_comment_timestamp) ASC, n.nid ASC', $last, $last_nid, $last,
$last, $last, 0, $limit);

so I was wrong in my reference to the node's created value, since it's
the changed value that matters. And indeed the changed value was okay
- it was the last_comment_timestamp which was set to next May. And
that was because the comments module initializes the
last_comment_timestamp in the node_comment_statistics as the node's
created value (hehe, so i was sort of right ..). In other words,
another option is:

5. A bug in the comments module on line 290 - when a new node is
created, the last_comment_timestamp of the node_comment_statistics
should be initialized to the current date, or the lesser of the
current date and the node's created date.




On 4/3/07, Robert Douglass <rob at robshouse.net> wrote:
> Alan,
>
> nice report. Is there an issue in the issue queue for this? It seems a
> simple matter to add a clause to the query to ignore future nodes, and
> this isn't the first time I've heard of the problem. I'd be happy to
> work on this if nobody already is. Point me to an existing issue or one
> that you create. Thanks.
>
> Robert Douglass
>
> Alan Dixon wrote:
> > I think i've just figured out a problem with a site I'm working on and
> > wanted some wisdom from ths list. The site is
> > http://community.telecentre.org/ (not that it matters).
> >
> > The problem:
> >
> > The problem was that the search module's database stopped getting
> > updated (i.e., new material wasn't showing up in searches). I looked
> > at the search_dataset table and discovered that the biggest nid (i.e.,
> > sid) was from a node that was published about 9 months ago (hmm, seems
> > like most folks don't believe a site search anyway?).
> >
> > The diagnosis:
> >
> > So, I ran some debugging and discovered that the sql in
> > node_update_index (the one that tells search whether there are any new
> > nodes to spider) was returning no rows all the time, even though there
> > was lots of new content. After struggling with the logic in the SQL, I
> > think I figured out that the problem was a single node which had
> > gotten a date of May 2007 in the created field. I don't think that's
> > normally a problem, but the node_update_shutdown function (which is
> > invoked in case search gets aborted because it runs out of time) saves
> > the system variables node_cron_last and node_cron_last_nid as the
> > current node's created and nid values.
> >
> > Conclusion: I think what happened was that the search indexer got
> > aborted while processing a node with a future date. That inserted a
> > future value into node_cron_last, which means that nodes don't get
> > spidered again until that date.
> >
> > Question: (multiple choice to make it easy ...)
> >
> > 1. is this a problem with the node_update_shutdown logic (or the point
> > in node_update_index when the last_change global gets set for it)?
> >
> > 2. Or is it a bug in the aggregator2 module that creates nodes with
> > 'created' set in the future?
> >
> > 3. Or have i misdiagnosed the problem?
> >
> > 4. All of the above ...
> >
> > Comments:
> >
> > I've heard of other mysterious search indexing failures like this. It
> > took me quite a while to figure out what was going on - the logic in
> > what nodes get spidered is pretty complex. Does anyone have any handy
> > tools for such search problem diagnosis? Sounds like a useful addition
> > to the devel module or as a separate one. Something that can explain
> > how many and which nodes will be spidered by the next cron perhaps ...
> >
> >
> >
>
>


-- 
Alan Dixon, Web Developer
http://alan.g.dixon.googlepages.com/


More information about the development mailing list