[development] mysterious search issue

Doug Green douggreen at douggreenconsulting.com
Tue Apr 3 15:25:49 UTC 2007


I've run into a similar problem -- not sure if it is the same problem -- I
just never filed an issue.  I have a problem with comments getting dates in
the future and the logic for node_cron_last and node_cron_last_nid breaks
when this happens.

So I wrote the following custom cron hook to fix this:

function mojo_cron() {
  $max_sid = db_result(db_query('SELECT MAX(sid) FROM {search_index}'));
  $max_nid = db_result(db_query('SELECT MAX(nid) FROM {node}'));
  if ($max_sid + 1000 < $max_nid) {
    watchdog('mojo', t('search indexing is behind (%sid of %nid)',
array('%sid' => $max_sid, '%nid' => $max_nid)), WATCHDOG_NOTICE);
  }

  $last = variable_get('node_cron_last', 0);
  if (!$last || $last > time()) {
    $max_created = db_result(db_query('SELECT MAX(created) FROM {node} n
INNER JOIN {search_index} s ON s.sid=n.nid'));
    variable_set('node_cron_last', $max_created);
    watchdog('mojo', t('fixed cron last time'), WATCHDOG_NOTICE);
  }
}

Doug Green
904-583-3342
www.douggreenconsulting.com
 
Bringing Ideas to Life with Software Artistry and Invention...
Providing open source software political solutions

-----Original Message-----
From: development-bounces at drupal.org [mailto:development-bounces at drupal.org]
On Behalf Of Mark Leicester
Sent: Tuesday, April 03, 2007 11:12 AM
To: development at drupal.org
Subject: Re: [development] mysterious search issue

Hi Alan,
Is there any chance it's to do with this: http://drupal.org/node/90128 ?
Hope this helps!
Mark

On 3 Apr 2007, at 15:58, Alan Dixon wrote:

> I think i've just figured out a problem with a site I'm working on and
> wanted some wisdom from ths list. The site is
> http://community.telecentre.org/ (not that it matters).
>
> The problem:
>
> The problem was that the search module's database stopped getting
> updated (i.e., new material wasn't showing up in searches). I looked
> at the search_dataset table and discovered that the biggest nid (i.e.,
> sid) was from a node that was published about 9 months ago (hmm, seems
> like most folks don't believe a site search anyway?).
>
> The diagnosis:
>
> So, I ran some debugging and discovered that the sql in
> node_update_index (the one that tells search whether there are any new
> nodes to spider) was returning no rows all the time, even though there
> was lots of new content. After struggling with the logic in the SQL, I
> think I figured out that the problem was a single node which had
> gotten a date of May 2007 in the created field. I don't think that's
> normally a problem, but the node_update_shutdown function (which is
> invoked in case search gets aborted because it runs out of time) saves
> the system variables node_cron_last and node_cron_last_nid as the
> current node's created and nid values.
>
> Conclusion: I think what happened was that the search indexer got
> aborted while processing a node with a future date. That inserted a
> future value into node_cron_last, which means that nodes don't get
> spidered again until that date.
>
> Question: (multiple choice to make it easy ...)
>
> 1. is this a problem with the node_update_shutdown logic (or the point
> in node_update_index when the last_change global gets set for it)?
>
> 2. Or is it a bug in the aggregator2 module that creates nodes with
> 'created' set in the future?
>
> 3. Or have i misdiagnosed the problem?
>
> 4. All of the above ...
>
> Comments:
>
> I've heard of other mysterious search indexing failures like this. It
> took me quite a while to figure out what was going on - the logic in
> what nodes get spidered is pretty complex. Does anyone have any handy
> tools for such search problem diagnosis? Sounds like a useful addition
> to the devel module or as a separate one. Something that can explain
> how many and which nodes will be spidered by the next cron perhaps ...
>
>
>
> -- 
> Alan Dixon, Web Developer
> http://alan.g.dixon.googlepages.com/




More information about the development mailing list