Why do we reinvent the wheel of doing revision control when we could (possibly) just use another revision control system to manage the change history of node bodies? We're currently storing the entire body for each revision, which is wasteful, especially if the size of the node is large, or it's changing frequently. For example, if I was going to import the entire manual for the software project I develop on my day job into a drupal site, a) I'd definitely need to save revision history and b) that history would grow enormous if the full text of each page was saved on every change.
The idea of storing only diffs is good in principle, but the devil is in the details: - The need is not there for all types of sites. Only some need it for example, a documentation project. Most don't (e.g. blogs, news articles, ..etc.) - This would force a dependancy on Subversion. What about shops who use CVS, others use bzr, ...etc. - A dependancy on an external module would be slow, since this has to fork a process, execute the backend, and get the data. Granted if we cache the latest text and only do this to get the diffs (an infrequent operation), then it is less of an issue. - How does Wikipedia handle this? They do it all day, and it works well with a visual diff and all the nice things, and it is PHP, MySQL and GPL, so we can freely plagiarize it. The proper way to handle this is to have a pluggable storage back end for nodes, just like linux handles pluggable authentication modules (PAM). If there is a revisions API that can use Wikimedia's thing, another that uses subversion, another that is a null interface, ...etc. My $0.02 Cdn.