Project: Drupal Version: cvs Component: node system Category: tasks Priority: normal Assigned to: killes@www.drop.org Reported by: killes@www.drop.org Updated by: killes@www.drop.org Status: patch Attachment: http://drupal.org/files/issues/revisions_15.patch (52.39 KB) There was some error in node_save and also the patches to the database.inc files got lost... killes@www.drop.org Previous comments: ------------------------------------------------------------------------ May 5, 2004 - 17:25 : killes@www.drop.org Currently all node revisions are stored in a serialized field in node.table and retrieved for _each_ page view although they are rarely needed. However, we have agreed that serializing data is bad and that we should try to keep the memory foot print pf Drupal small. Therefore I propose to create a separate revisions table which would be in principle identical to the node table, only that it could have several old copies of the same node. Extra data added by other modules could be added in a serialized field unless we find a better solution. ------------------------------------------------------------------------ May 5, 2004 - 18:06 : jhriggs I too think the serialized approach is less than desirable, but here's an alternative. This would likely take some considerable rework in core and contrib, but the following is how we handle similar types of situations in our databases at work. It is more elegant that a separate table, and avoids the (almost exact) duplication of a table. Instead of separate tables, keep all revisions of nodes in the node table as follows: * add field: active (0/1 or Y/N) * add field: revision * every revision of a node is stored in the node table; however, only one revision can be active at any given time * nid can no longer be unique -- primary/unique key becomes (nid, active) * any time a node is loaded, updated (without revision), etc., the active version is used. Thoughts? ------------------------------------------------------------------------ May 5, 2004 - 18:57 : killes@www.drop.org I am not opposed to your scheme, but I want to stress the following: * Duplicating a table's structure is not bad (IMHO) as long as the content is different. * having two tables will allow us to have a rather small node table. This is (maybe) a performance gain. ------------------------------------------------------------------------ May 5, 2004 - 19:37 : jhriggs I don't necessarily think that duplicating a table's structure is _bad_. It just seems to be wasteful and a pain to maintain. (Every change to the node table is made twice...easy to do, but also easy to miss perhaps.) As for performance, as long as nid and the active indicator are indexed, there shouldn't be any performance loss. Also, archiving an old version when making a new revision will be much simpler: just change the active indicator rather than copying an entire node to another table (and ensuring everything gets copied...again a potential maintainance issue). To be honest, I would just like to see the serialized data go away, regarless of what approach is taken. ------------------------------------------------------------------------ July 30, 2004 - 20:49 : Nick Nassar Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_07-30-2004.pa... (10.47 KB) I'm interested in using Drupal for a large scale wiki-type project. In order to do this, I need revisions to be in their own table. Attached is a patch to do just that. Most of the changes are pretty self explanatory. Spreading out node data across two tables meant that I had to add database functions to do locking/transactions. Without this, race conditions in which the database becomes corrupted are possible. ------------------------------------------------------------------------ July 30, 2004 - 20:54 : Nick Nassar Oh yeah... The patch is a diff against Drupal CVS ------------------------------------------------------------------------ July 31, 2004 - 01:00 : Anonymous Gerhard speaking. Nick, thanks a lot for your nice patch! It saves me a great deal of labour. I looked through it and immediately liked it. You not only put the old revisions into a new table but also the current one. Do you have an estimate how much more expensive the additional join is? Besides a few minor coding style issues I found a major one: Just a few hours before you uploaded your patch JonBob's node access patch hit core. That means your patch won't apply anymore as all the queries you change have been changed. Can I bug you to update your patch? ------------------------------------------------------------------------ July 31, 2004 - 02:11 : Anonymous Also I think that your upgrade path loses existing revisions. ------------------------------------------------------------------------ July 31, 2004 - 03:39 : drumm I think this is the proper way to do things. No columns are duplicated, there is no serialized data, and only the fields that are logically revised are stored. Nothing jumped out at me as a way to have my node modules be able to keep a table of revisions of additional fields. I'm guessing this could be done within the confines of _insert and _update. Assuming the upgrade path works and modules can extend it I give it a +1. ------------------------------------------------------------------------ July 31, 2004 - 15:40 : Nick Nassar It figures that just as I finish a big patch, another patch comes along and breaks it. Oh well, it should be a pretty easy to fix. I'll work on it. Fixing the upgrade path to keep revisions should be fairly painless. I found another issue that needs to be fixed before this patch gets merged. There format of a node needs to be stored for each revision. Otherwise, for modules that store a format for the nodes, such as page and book, if you write one revision in PHP and the next in HTML, the PHP revision will be displayed as HTML. This is part of a larger issue of how node modules should store revisions of additional fields. I think each module that wants to do this should create another table with (nid, revid) as the primary key. Just as when they want to add fields to a node they create another table with nid as the primary key. As far as performance goes, for sites that make heavy use of revisions, an extra join on primary keys is going to be a lot faster than grabbing all of the revisions from that database everytime. We would need to run benchmarks to determine is the overall difference in speed is for an average site is a gain or a loss. I'm guessing it's very minor either way. ------------------------------------------------------------------------ August 23, 2004 - 14:55 : Nick Nassar Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_08-23-2004.pa... (10.92 KB) Here's an updated patch against CVS that puts revisions in their own table, provides an upgrade path, and fixes the format related bugs in the last patch. Hopefully, this can make it into CVS as soon as the freeze is over. ------------------------------------------------------------------------ August 23, 2004 - 15:10 : moshe weitzman Interesting patch ... drumm's question is still outstanding. how do modules store revisions of their fields? Are they expected to manage this on their own? Thats not how it works today. As an aside, i am seeing profile_ fields in my node.revisions column. One could argue that those need not be saved. They pertain to the node author, not to the node itself. ------------------------------------------------------------------------ August 23, 2004 - 17:14 : Nick Nassar Having modules be responsible for storing revisions of their own fields is a side-effect of storing revision data in tables. There's really no way around it. However, revisions generally don't make sense for node types that don't have PHP/HTML content, such as polls. I think it's going to be a pretty rare scenario for a new node type to want another field to change per-revision, so it's a pretty good trade-off. Storing fields that shouldn't be part of revisions, such as the profile_ fields, is a side-effect of storing revisions as serialized objects. Applying this patch will free up that wasted space. :) ------------------------------------------------------------------------ August 23, 2004 - 18:20 : Anonymous There should be a hook that let's the module choose whether it supports history. This way a module author can prevent the user from doing something that may break his module or just cause undefined behavior. If the module doesn't support history then don't let the user/admin choose to add history to nodes of that type. Craig ------------------------------------------------------------------------ August 23, 2004 - 20:23 : Nick Nassar I agree, there should be an API change to make specifying support for revisions easier. In the interests of keeping patches small and keeping to one change per patch, I think the API change should be a separate issue. A sort of ad-hoc API to decide whether or not a module supports revisions by default already exists. Instead of having a hook, modules set the default value of the "Create new revision" field in the edit form. The admin can change this option in admin/node/configure/defaults. This patch doesn't change that. Revisions are broken for node types that have their own database structure, like polls, even when storing them as serialized objects. This patch doesn't change that, either. ------------------------------------------------------------------------ October 26, 2004 - 03:35 : moshe weitzman I'm guessing that someone is going to have to demonstrate that this patch performs as well as current drupal before it gets comitted. i think this patch is a few benchmarks from being comitted. ------------------------------------------------------------------------ October 27, 2004 - 02:04 : Nick Nassar Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004.pa... (11 KB) I ran some really unscientific benchmarks, and it looks like this patch has a negligible affect on performance. I used apache bench and the database from theregular.org, which doesn't contain any revisions (worst case scenario for this patch) and contains several hundred nodes. Both the patched and unpatched versions hovered between 2.36 and 2.38 requests per second. The command I used was: ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2' http://192.168.0.100/ An updated patch that should apply to CVS is attached. ------------------------------------------------------------------------ October 27, 2004 - 02:05 : Nick Nassar I ran some really unscientific benchmarks, and it looks like this patch has a negligible affect on performance. I used apache bench and the database from theregular.org, which doesn't contain any revisions (worst case scenario for this patch) and contains several hundred nodes. Both the patched and unpatched versions hovered between 2.36 and 2.38 requests per second. The command I used was: ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2' http://192.168.0.100/ An updated patch that should apply to CVS is attached. ------------------------------------------------------------------------ October 27, 2004 - 02:05 : Nick Nassar Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004.pa... (11 KB) I ran some really unscientific benchmarks, and it looks like this patch has a negligible affect on performance. I used apache bench and the database from theregular.org, which doesn't contain any revisions (worst case scenario for this patch) and contains several hundred nodes. Both the patched and unpatched versions hovered between 2.36 and 2.38 requests per second. The command I used was: ab -n50 -C 'PHPSESSID=b01a9f92880ef215b0ed6f1314a5eba2' http://192.168.0.100/ An updated patch that should apply to CVS is attached. ------------------------------------------------------------------------ November 15, 2004 - 06:05 : elias1884 please overthink the revision system default workflow as well. don't look at the revision system as an isolated system but as a part of the whole workflow system! if you combine revisions with the moderation queue the most logic default workflow would be like that: auth user creates node (revision #0) admin approves the node (status = 1, moderation = 0) => node publicly available auth user finds typo and changes node (revision #1, status = 0, moderation = 1) ------- what happens at that point at the moment is, that the node is not accessible anymore at all until the new revision is approved by admin. of course the new revision should not go online until reviewed and approved, this is absolutely correct, but there is no reason to not take the old revision offline, since it was already approved and should therefore be online until the new revision is approved. it is not practical if a node disappears only because the author corrected a typo. ------- admin approves the node (status = 1, moderation = 0) eventhough I first thought a plain boolean active field would not be capable of providing that functionality if finally came to the conclusion, that it can. The only thing to do is to not set that bit, when a new revision is created, but when it is approved (in case moderation is activated under default workflow). Every revision should have its own moderation, status and active field and on approval they are set like this (status=1, moderation=0, active=Y). When you wanna rollback to an old revision, you can chose between all revisions that already have the moderation bit set back to 0 again and the published to 1. There should be an extra permission for rollback! another concern that I have about the default workflow is, that users can't see the content, they have just created, when moderation is enabled. Eventhough, there is a big fat "submission accepted" presented after submissions, unexperienced users tend to question the information those stupid tincans give them, if they can't find their content afterwards. Many users are really lazy bastards and they don't even read the status messages. The best feedback about whether his story was submitted successfully or not of course is, if he can find the story somewhere on the site, maybe with a status message on top of it, mentioning, that the content is currently not publicly available since it has not been approved yet. there should be a my content section under my account, like somebody is trying to do with the workspace module I guess. so my suggestion is to make (status=0, moderation=1) still available for the creator under a my content section somewhere! ------------------------------------------------------------------------ November 24, 2004 - 05:21 : Nick Nassar I agree. The current workflow for moderation queues and revisions needs to change, but this patch isn't the place for it. The patch is already too big, and it only does the backend stuff. Instead of adding more to this patch and making it take even longer to get into core, would you mind creating a new issue for your UI suggestions, so the those changes can be added as a separate patch? Thanks, Nick ------------------------------------------------------------------------ December 11, 2004 - 13:26 : Dries This patch is _much_ needed so I'd love to see someone revive it. In order for this patch to be accepted, the following needs to be done: Update this patch to CVS HEAD. Rename revid to vid. Rename node_rev to node_revisions. Rename node_rev.changed to node_revisions.timestamp. Rename $rnode to $revision. Fix the coding style to match Drupal's: proper spacing, single quotes where possible, proper variable names. Benchmark this patch with a large database with enough revisions. I'd be happy to benchmark this on my local copy of the drupal.org database. The book.log field should probably move to the node_revisions table. This can be done in a separate patch. Investigate whether transactions are well-supported. ------------------------------------------------------------------------ December 13, 2004 - 01:25 : killes@www.drop.org Attachment: http://drupal.org/files/issues/Drupal-Improved_Revision_Schema_10-26-2004-re... (11.02 KB) I've worked a bit on the patch (coding style issues as mentioned by Dries). One thing I noticed is that the patch uses REPLACE. IIRC this needs to be chagned to "UPDATE, if fail INSERT" for pgsql compatibility. Nick, are you still interested in working on that patch? I'd like to know how it works on your site and work on getting it into core. ------------------------------------------------------------------------ December 13, 2004 - 13:33 : Dries Gerhard: your patch does not apply. ------------------------------------------------------------------------ December 13, 2004 - 18:10 : killes@www.drop.org Yes, I know, that was the same version as I mailed to you earlier. ------------------------------------------------------------------------ December 13, 2004 - 22:02 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions.patch (52.96 KB) Ok, upüdated the patch to cvs. ------------------------------------------------------------------------ December 14, 2004 - 09:58 : Dries Some more comments: db_begin_transaction() and db_end_transaction() do not belong in database.inc, but in database.mysql.inc and database.pgsql.inc respectively. The node module calls node_revisionsision_list() which is not defined. (Fxed that on my local copy.) Do db_begin_transaction() and db_end_transaction() deprecate Jeremy's table locking patch? The upgrade path assigns the wrong user ID to each revision. The upgrade path assigns the wrong date to each revision (that or a node's revision page shows the wrong usernames/dates). The coding style needs a bit of work, but we can worry about that later. ------------------------------------------------------------------------ December 14, 2004 - 18:34 : Nick Nassar If you need any help getting those things fixed, just let me know. ------------------------------------------------------------------------ December 14, 2004 - 18:50 : Nick Nassar How this relates to Jeremy's node locking patch: There was lots of discussion, and node locking was decided against because from an end user point of view you never want a node to be locked. He's now advocating for a much simpler patch that warns users if their changes will overwrite someone elses. That patch still has a race condition, which might be fixed using db_begin_transaction(). http://drupal.org/node/6025 ------------------------------------------------------------------------ December 14, 2004 - 23:26 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_0.patch (55.96 KB) Here is an updated patch that tries to address Dries concerns. ------------------------------------------------------------------------ December 15, 2004 - 09:32 : Dries Attachment: http://drupal.org/files/issues/revisions-bug.png (76.06 KB) It didn't fix the aforementioned bugs. See attached screenshot. ------------------------------------------------------------------------ January 6, 2005 - 21:15 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_1.patch (51.77 KB) Ok, here is a new version. Dries and myself worked hart at it, so please have a look. what is still missing - database upgrades for the core modules with an own table - contrib modules need an upgrade too. - do we need nid and vid in both the node and the node_revisions table? - the amount of sql queries means a good stress testing for large databases. ------------------------------------------------------------------------ January 19, 2005 - 22:43 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_2.patch (49.49 KB) Here is an updated patch. We discussed to keep the current title in node module and also in the revisiosn table. This is content duplication but will save many joins as many queries only need the title of a node. Discussion is welcome. ------------------------------------------------------------------------ January 20, 2005 - 00:33 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_3.patch (29.93 KB) I've implemented the aforementioned solution. This makes the patch much smaller. The patch now also removes taxonomy_node_has_term() which wasn't used anywhere. I'd really apprciate if some people could test drive the patch. It will be another huge improvement for 4.6. ------------------------------------------------------------------------ January 20, 2005 - 01:05 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_4.patch (30 KB) Another revision. Steven didn't like my literal $node->vid in queries. ------------------------------------------------------------------------ January 20, 2005 - 02:10 : killes@www.drop.org - database upgrades for the core modules with an own table - contrib modules need an upgrade too. - do we need nid and vid in both the node and the node_revisions table? - the amount of sql queries means a good stress testing for large databases. These issues are still open, btw. Especially the first one needs to be tackled. ------------------------------------------------------------------------ January 25, 2005 - 21:11 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_5.patch (51.13 KB) Here is a patch that has the database tables updated for forum, book, and page module. ------------------------------------------------------------------------ January 29, 2005 - 23:55 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_6.patch (49.18 KB) Yet another update to keep it working with head. The patch now also removes the table definitons for the page table. ------------------------------------------------------------------------ January 29, 2005 - 23:57 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_7.patch (55.69 KB) Sorry, that was the old version, this is the right one. ------------------------------------------------------------------------ January 31, 2005 - 20:55 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_8.patch (55.71 KB) Updated once more. ------------------------------------------------------------------------ January 31, 2005 - 21:52 : Dries Anyone to help review/test this? ------------------------------------------------------------------------ January 31, 2005 - 22:22 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_9.patch (49.29 KB) Updated again, the update functions occurred twice. Thanks Bart. ------------------------------------------------------------------------ February 2, 2005 - 01:27 : killes@www.drop.org Don't know if the db I am using is corrupted or what. I still do have some didficulties. The latest patch is attached. ------------------------------------------------------------------------ February 2, 2005 - 01:27 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_10.patch (49.67 KB) I am probably slowly going mad ... ------------------------------------------------------------------------ February 2, 2005 - 02:54 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_11.patch (48.95 KB) The update issue still needs investigating. This patch is updated for cvs. ------------------------------------------------------------------------ February 2, 2005 - 21:20 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_12.patch (49.83 KB) Ok, here is a new version. I've solved my troubles with book.module. There are still some issues with forum module. Possibly due to inconsistent database. ------------------------------------------------------------------------ February 2, 2005 - 22:31 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_13.patch (49.83 KB) Turns out the drupal.org database had indeed some quirks. Please run this query in your oldest db and tell me the result: select nid,type from node where type like '%/%'; If you get a non-zero result we might need to add another security update. The patch could use still more testing, though. ------------------------------------------------------------------------ February 3, 2005 - 02:16 : killes@www.drop.org Attachment: http://drupal.org/files/issues/revisions_14.patch (49 KB) Ok, we are getting somewhere. At a first glance the update is working. There is a problem remaining: the revisions tab will be shown whether the node has revisions or not. Not sure we can/need to fix this. People with a drupal.org account can log in at http://killes.drupaldevs.org/revision/ and poke around. Your permissions will be the same as on drupal.org. Feel free to vreak everything but don't forget to file complaints here. (Note: this is only a pruned version of the drupal.org database with all project nodes and nodes with nids > 7000 dropped). -- View: http://drupal.org/node/7582 Edit: http://drupal.org/project/comments/add/7582