Managing node revisions with a backend like subversion?
I posted this topic as a development forum on drupal.org: http://drupal.org/node/50682 Someone suggested I send it to this list for additional input and feedback. Summary: Why do we reinvent the wheel of doing revision control when we could (possibly) just use another revision control system to manage the change history of node bodies? We're currently storing the entire body for each revision, which is wasteful, especially if the size of the node is large, or it's changing frequently. For example, if I was going to import the entire manual for the software project I develop on my day job into a drupal site, a) I'd definitely need to save revision history and b) that history would grow enormous if the full text of each page was saved on every change. One (complicated) option would be to store diffs in the node_revision table, just like most revision control systems do. A (maybe) better option would be to just store the name of a subversion tag (or equivalent). We could still cache the full text of the currently active revision in the drupal DB (for performance and minimizing changes to other parts of the drupal code base), but we could split off the work of maintaining the whole revision ourselves and use a much more powerful tool for the job. Then, it'd be trivial to add much better revision support within drupal, for example a "blame annotated" view of a node, powerful diff viewing across revisions, and perhaps even something crazy like branching and merging node text (possibly integrated with taxonomy?). For more info, check out the forum node I mentioned above. I'd be willing to help make this a reality, but I wanted to know: a) Is there some obvious reason this hasn't been done (other than lack of developer time)? b) What's the right way to go about this? I'm fairly new to drupal, especially the core code. There seem to be numerous layers of abstraction for everything, so part of me (wishfully?) thinks there's some obvious layer to make this change and very little else would need to be modified. Am I being crazy? ;) c) Does anyone else think this would be a good idea, worth doing, etc? Thanks! -Derek p.s. It seems like the forum topic would be a better place for this discussion, but I'm new to this list so I don't know if people prefer talking about this via email, instead....
Why do we reinvent the wheel of doing revision control when we could (possibly) just use another revision control system to manage the change history of node bodies? We're currently storing the entire body for each revision, which is wasteful, especially if the size of the node is large, or it's changing frequently. For example, if I was going to import the entire manual for the software project I develop on my day job into a drupal site, a) I'd definitely need to save revision history and b) that history would grow enormous if the full text of each page was saved on every change.
The idea of storing only diffs is good in principle, but the devil is in the details: - The need is not there for all types of sites. Only some need it for example, a documentation project. Most don't (e.g. blogs, news articles, ..etc.) - This would force a dependancy on Subversion. What about shops who use CVS, others use bzr, ...etc. - A dependancy on an external module would be slow, since this has to fork a process, execute the backend, and get the data. Granted if we cache the latest text and only do this to get the diffs (an infrequent operation), then it is less of an issue. - How does Wikipedia handle this? They do it all day, and it works well with a visual diff and all the nice things, and it is PHP, MySQL and GPL, so we can freely plagiarize it. The proper way to handle this is to have a pluggable storage back end for nodes, just like linux handles pluggable authentication modules (PAM). If there is a revisions API that can use Wikimedia's thing, another that uses subversion, another that is a null interface, ...etc. My $0.02 Cdn.
I've thought of this before, but I personally thought of it in the format of imports/ exports, and generating a filesystem access mechanism for drupal, which could then entirely be stuck into a repository. The actions / workflows would help with this too, as you could trigger a commit every time a node is changed. http://drupal.org/node/42254 On 25 Feb 2006, at 3:11 AM, Khalid B wrote:
- This would force a dependancy on Subversion. What about shops who use CVS, others use bzr, ...etc.
not unless we develop a standard 'repository' module, kind of like we have image instead of bmp, jpg etc modules. -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
Derek Wright wrote:
I posted this topic as a development forum on drupal.org:
Someone suggested I send it to this list for additional input and feedback.
Summary:
Why do we reinvent the wheel of doing revision control when we could (possibly) just use another revision control system to manage the change history of node bodies?
Mainly because (contrary to what I usually say) most people still care for people with limited hosting options. Try installing svn over ftp. :p Cheers, Gerhard
On 25 Feb 2006, at 4:18 AM, Gerhard Killesreiter wrote:
Try installing svn over ftp. :p
svn might not even be installed on the same server. imagine it in the framework of a publishing workflow. All your content creators edit the content on the intranet, and then they commit their final information to subversion, and then you update the edge server, and all the data is updated. -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
participants (4)
-
Adrian Rossouw -
Derek Wright -
Gerhard Killesreiter -
Khalid B