[drupal-devel] [bug] Make node_save() recursion-aware

I1uv4t4r drupal-devel at drupal.org
Tue Sep 6 18:28:51 UTC 2005


Issue status update for 
http://drupal.org/node/27007
Post a follow up: 
http://drupal.org/project/comments/add/27007

 Project:      Drupal
 Version:      cvs
 Component:    node system
 Category:     bug reports
-Priority:     normal
+Priority:     critical
 Assigned to:  jvandyk
 Reported by:  jvandyk
 Updated by:   I1uv4t4r
 Status:       patch (code needs review)

I am creating a module that creates an additional node in
hook_nodeapi('insert', ...). The additional node was created OK, but
the actual node (where hook_nodeapi was invoked for in the first place)
not: information like taxonomy was lost. I couldn't find anything to fix
this bug, untill I came accross this post.


The problem appearently was that I was using node_save recursive, and
my module name starts with an A and the taxonomy module with a T (duh).
Well, the patch above (save_hooks3.diff) certainly fixed this problem
for me by creating the node in the 'insert post' hook instead of the
'insert' hook. Very nice!


I agree with jvandyk, there's no downside to this patch, there are some
additional calls but at least module_implements 'caches' the list of
modules that implement a certain hook, hook_nodeapi in this case. I
myself cannot find a practical implementation of the 'insert pre' hook
(or 'update pre'), though. jvandyk says something about sending an
e-mail, but this can be done just as wel in either 'insert' or 'insert
post'.


By the way, another thing that worked for me was changing in
node_invoke_nodeapi:
foreach (module_implements('nodeapi') as $name) {
to
$implementations = module_implements('nodeapi');
foreach ($implementations as $name) {
I don't know why this worked, but it probably has something todo with
the static variable in module_implements.


I think this patch is very usefull and this bug /should be fixed in
4.7/.


Thank you.




I1uv4t4r



Previous comments:
------------------------------------------------------------------------

Tue, 19 Jul 2005 02:14:16 +0000 : jvandyk

Attachment: http://drupal.org/files/issues/node_serialized2x.patch (664 bytes)

node_save() saves a node, then calls insert or update hooks. These hooks
may run functions that, in turn, call node_save().


This all works fine except that if a revision is being created, the
revision gets serialized twice -- once on the original node_save() call
and once on the callback's node_save() call.


The attached two-line patch simply inserts a boolean variable called
rev_serialized to keep track of whether the serialization has already
been done.


I'd like to see cvs and 4.6 patched.


This patch is necessary so I can release actions and workflow for 4.6.


Background: http://drupal.org/node/24326




------------------------------------------------------------------------

Tue, 19 Jul 2005 18:36:23 +0000 : Dries

I don't like this patch.  It's a hairy fix.  There must be a better way
to avoid recursion in first place.




------------------------------------------------------------------------

Thu, 18 Aug 2005 01:26:21 +0000 : moshe weitzman

gosh, this a two line change with zero performanhce impact. seems like a
useful and minimally invasive fix for our current dilemma. we'll remove
it once a better fix comes around. is a shame to hold back
workflow/actions for such a minor issue. any others want to comment or
suggest a different solution?/




------------------------------------------------------------------------

Thu, 18 Aug 2005 04:57:11 +0000 : chx

If there would be a little comment why this is needed, then I'd +1 it
heartily.




------------------------------------------------------------------------

Thu, 18 Aug 2005 11:59:14 +0000 : killes at www.drop.org

Please commit. In case revisions shouldn't make it (for performance
reasons ;) we need it fixed for 4.7.




------------------------------------------------------------------------

Thu, 18 Aug 2005 14:22:15 +0000 : moshe weitzman

The first user of this will be workflow+actions modules. They listen on
nodeapi's insert/update operations. When a node is seen there,
arbitrary actions may fire according to the admin's wishes. So for
example, a node may be promoted or may become sticky. To do that
properly, the actions should call node_save() again with
$node->promote=1. Thus, we are recursively calling node_save() which
leads to a double serialize problem. This 2 line fixes it.


There may be more nodes which benefit from this.




------------------------------------------------------------------------

Thu, 18 Aug 2005 14:23:24 +0000 : moshe weitzman

I meant to say - "there may be more *modules* which benefit from this."




------------------------------------------------------------------------

Thu, 18 Aug 2005 21:50:36 +0000 : Dries

Won't commit.  If you are calling node_save() recursively, something is
really wrong.  For one, you can no longer guarantee that the node is in
a 'safe state'. Like, what happens if you saved half a node and you
recursively try saving that node again?  This needs a better solution.




------------------------------------------------------------------------

Thu, 18 Aug 2005 23:21:41 +0000 : chx

I have a possible solution, which does not need core change. Keep a
stack of necessary actions in SESSION and pop the stack first in _exit
then subsequent _init calls 'till the stack is empty.




------------------------------------------------------------------------

Fri, 19 Aug 2005 00:58:11 +0000 : jvandyk

Attachment: http://drupal.org/files/issues/save_hooks.diff (1.83 KB)

OK, here is an alternate approach. We leave node_save() as it is, but
add two new nodeapi hooks to node_submit().


For reference, here's the meat from my message to the devel list, as
chx wants background:


The nodeapi calls for nodes go like this on node submission:


1. nodeapi validate hook
2. (node is saved)
3. nodeapi insert/update hook
   a... modules respond
   b... modules respond
   c... modules respond
   ...
   workflow modules responds, fires actions, the last of which saves
the node (go to 2)
   x... modules respond
   y... modules respond
   z... modules respond
As you can see, we have some problems. Because module_invoke() works
alphabetically, any modules with names alphabetically after "workflow"
have not yet responded when the workflow hook is invoked. Remember, the
workflow hook activity looks like this:


1. nodeapi insert/update is caught by workflow module
2. workflow module fires the 'transition pre' workflow hook to let
modules know a transition is about to occur
3. modules have a chance to veto the workflow transition
4. if no module vetoes the transition, the 'transition post' workflow
hook is fired so modules can respond to the workflow state change.
So to move ahead, we need a 1a nodeapi hook that says, "validation
passed and this node will now be saved" so workflow changes can happen
there, and/or a 4 nodeapi hook that says, "the node has been saved, all
modules have done their thing, it's safe to do a node_load and get a
node that is in a consistent state".


I'd like to see both because it gives maximum flexibility for workflow.
For example, it's much more efficient to change the status of the node
to published after validation but before saving than it is to use the
current method of saving the node again.


So the revised flow looks like this:


1. nodeapi validate hook
1a. node is guaranteed validated; nodeapi save pre hook (workflow fires
efficient actions like changing status)
2. (node is saved)
3. nodeapi insert/update hook
   a... modules respond
   b... modules respond
   c... modules respond
   ...
   x... modules respond
   y... modules respond
   z... modules respond
4. node is guaranteed saved; nodeapi save post hook (workflow fires
actions like email notification)


------------------------------------------------------------------------

Fri, 19 Aug 2005 19:00:53 +0000 : jvandyk

Attachment: http://drupal.org/files/issues/save_hooks2.diff (1.83 KB)

After discussion with chx and moshe, changed names of hooks to be update
pre, update post, insert pre, insert post so that listeners have a
better context in which to make decisions.


I don't like the _exit hook solution because that will slow down every
page.


Todd Grimason linked to this page [1] of Ruby on Rails hooks on
drupal-dev.


Although these are 4 hooks, note that only 2 will ever execute at a
given time, because we are either doing an insert or an update.
[1] http://api.rubyonrails.com/classes/ActiveRecord/Callbacks.html




------------------------------------------------------------------------

Fri, 19 Aug 2005 19:10:46 +0000 : chx

I like this, but if this gets in, next issue will be to clean up
'validate'...




------------------------------------------------------------------------

Fri, 26 Aug 2005 16:18:46 +0000 : Kjartan

>From the conversations on IRC here is where I ended up on nodeapi calls.


Nodeapi calls when creating a new node:
  default values -> form pre -> form post -> validate -> finalize
insert -> insert


Nodeapi calls when updating an existing node:
  load -> form pre -> form post -> validate -> finalize update ->
update


Not sure I completely agree with splitting finalize, but I am sure
those who support it will post nice examples to make it a clear choice
:)


This would remove the hack in node_form with validating an empty node,
and let modules perform actions just before its saved.




------------------------------------------------------------------------

Tue, 30 Aug 2005 16:17:12 +0000 : jvandyk

Attachment: http://drupal.org/files/issues/save_hooks3.diff (1.78 KB)

I have updated this patch to work with HEAD after the recent node
changes.


Here's a review of this patch:


Why do we need the 'update pre' hook?


Because we need a nodeapi call where modules can assume that they are
working with a node that has passed validation. For example, if actions
such as e-mail sending are going to fire, you don't want them to fire
and then a module later in the node_invoke_nodeapi() sequence to throw
a validation error. You want validation to pass as a separate step, and
then be able to fire actions without worrying that a node_save() might
not happen.


Why do we need the 'update post' hook?


Because we need to guarantee that we are working with a "clean" node.
If you use the 'insert' or 'update' hooks, a module later in the
node_invoke_nodeapi() sequence may make changes to database tables that
will affect the node. Modules typically use the 'insert' and 'update'
hooks to modify their own private tables. To ensure consistency and
avoid corruption, you want to make certain that all modules have
completed their own private database updates before moving on. Why do
you want to make certain of this? Because you do not know what actions
are going to do. Actions (as in actions.module) are by definition
arbitrary. It is much better to have all tables in a consistent state
before firing actions. You want to make sure that if an action does a
node_load it will get a clean node, including all the updates that
listening modules do. Steven, please chime in on this as this was your
concern.


One other point on the 'update post' hook. It is conceivable that
actions fired may run a long time or run into problems. Of course
actions need to be written as cleanly as possible. 'update post' gives
actions a chance to decide that they will do things that may have
problems (xmlrpc posts that may timeout, for example) and breathe easy
knowing that even if there is a timeout or other error, the node has
already been saved.


Do we need 'insert pre' and 'update pre'? Why not just use 'save pre'?


I think that since we already know the context, it is nice to pass that
along to modules so they do not have to test for the existence of
$node->nid to waste a cycle figuring out something we already know.


What about naming?


Kjartan has suggested changing the name 'update pre' to 'finalize
update' and 'update insert' to 'finalize insert'. I have left the names
as is because I think both hooks are important and I think the current
names are easiest to understand. 'finalize' does not give a clue as to
whether it is before or after the actual node_save(). But I am open to
better naming if it is easy to understand for the developer.


What's the downside?


I do not see a downside to this patch. It adds two nodeapi calls on
insert, or two on update. It is not a performance hit. It adds only 4
lines of code. It does not require anyone to rewrite anything. And it
is the best solution we have to the problem.







More information about the drupal-devel mailing list