Issue status update for http://drupal.org/node/1482 Project: Drupal Version: cvs Component: book.module Category: feature requests Priority: normal Assigned to: puregin Reported by: BenEng Updated by: puregin Status: patch Attachment: http://drupal.org/files/issues/explode2dir.php (6.64 KB) The attached command-line PHP script may be useful in testing the XML export patch supplied. Assuming your local version of PHP is built with CLI support and XML parser support, you should be able to run the script against an XML export file generated by the book module with my patch. After you have installed the patch, you can select a book page, click on the 'export XML' link, and save the result as a file, say 'test.xml'. Then run the script. On my system this looks like this: % ./export2dir.php test.xml This will produce output that looks something like this: ./explode2dir.php test.xml md5: 9e8ca98c6a8be35c21f31f7937608acc weight: 1 md5: 11a1956a1592feac37abee6b469e62c8 weight: 0 md5: ed4c91279d3bed28b56899b75ccaa9aa weight: 0 It will generate a directory hierarchy, with one directory per book node. Each directory contains a file containing the node contents and a file 'nid' containing the metadata. You can check, for example, that the md5 signature of the contents match the md5 signature recorded with the metadata. Djun puregin Previous comments: ------------------------------------------------------------------------ April 5, 2003 - 14:38 : BenEng Export an entire collaborative book to xml (docbook) so that it can be formatted using xslt (e.g., to rtf or pdf) and printed. It would also be nice to be able to perform the reverse. That is to import a collaborative book from an xml (docbook) file that is uploaded. ------------------------------------------------------------------------ June 2, 2004 - 17:10 : moshe weitzman can someone suggest an xml schema for this? i think we need a general xml schema for nodes. after that, this becomes a simple matter of nesting node elements (I think) ------------------------------------------------------------------------ February 1, 2005 - 02:42 : Teto Hi, Is there any news about a such feature ? All i've found about a docbook schema is here : http://docbook.sourceforge.net/projects/schema/ It seems there isn't much in the docbook cvs about that. Teto. ------------------------------------------------------------------------ May 14, 2005 - 02:11 : puregin Here's a list of the Book publishing DTDs I know about: NameNotesRef ISO 12083:1998//DTD Book//EN - this includes ISO 12093:1993//DTD Mathematics//EN Committee standard - very general. Used e.g. by University of California Press. www.xmlxperts.com/bookdtd.htm [1] DocBook Applications - widely used by Computer book publishers, e.g. O'Reilly. Good support. docbook.org [2] TEI/TEI-Lite Applications - scholarly/historical/literary documents www.tei-c.org [3] MIL-STD-38784 (CALS) Applications - Military/Govt/Enterprise publishing http://xml.coverpages.org/mil-std-38784-a1-dtd.txt [4] I'd highly recommend DocBook as a useable, technically focused, XML DTD with strong toolset support. [1] http://www.xmlxperts.com/bookdtd.htm [2] http://docbook.org [3] http://www.tei-c.org/ [4] http://xml.coverpages.org/mil-std-38784-a1-dtd.txt ------------------------------------------------------------------------ May 18, 2005 - 01:37 : puregin I'd suggest we start with something very simple. The patch which I submitted for http://drupal.org/node/1898 wraps each node in <div> tags, with a level, and a node id attribute, for printer friendly output. We can't rely in general on the contents of a node being XHTML, even if we force output through an XHTML validator such as tidy. So our best bet is to encode the entire contents of a node as CDATA. This gives us hierarchy, and encapsulated contents (of any kind - later this could also be other kinds of data or markup) This output will be valid XML, with a pretty simple DTD. It is easy to take such a file and write simple XSLT based scripts on the the client side to explode this file into a directory tree of HTML, a single HTML file, or many other formats. Importing is trickier. It's relatively easy to import an exported file, and update the nodes of the book according to the hierarchy defined by the sectional <div> elements. Importing needs to take care of structure which has changed - child nodes added, deleted, or moved. It would also be nice to have some client-side scripts to import other formats into this nested sectional <div> based format - for example, to take a directory tree of HTML fragments, and make this into an importable file. ------------------------------------------------------------------------ June 1, 2005 - 03:33 : puregin Attachment: http://drupal.org/files/issues/xml-export.patch (14.06 KB) This patch enables export of books as XML documents. The XML is DocBook "at the level of structure", but node contents are wrapped as CDATA, since we can't be sure that the contents are valid XML. Several other bugs/feature requests are also addressed with this patch: - Fixes bugs http://drupal.org/node/1898 http://drupal.org/node/1482 http://drupal.org/node/8049 http://drupal.org/node/1899 Should go a long way towards implementing feature request http://drupal.org/node/2062 It should also be easy to extend this to produce OPML, for example. - Adds about 170 lines, of which more than 100 are comments - Added doxygen comments - Made doxygen comment format consistent; fixed minor grammatical slips - A proper Doctype and more informative HTML element is generated for printer-friendly HTML output. - Refactored book_print() to use book_recurse(). - Refactored book_recurse(). Applies 'visitor' callback functions to nodes during weight/title order tree-traversal. The parameterized visitor callbacks can be used to generate different kinds of output. There are many other kinds of operations on books which can be implemented by writing a pre-node/post-node pair of callback functions: word-count/ statistics gathering, comparison, copying, search and replace... - Introduced book_export() which uses book_recurse() to generate DocBook-like XML to export book contents in a structured form. An md5 hash is computed for each node to help import code to decide if a node needs to be updated or not. ------------------------------------------------------------------------ June 3, 2005 - 01:29 : puregin Attachment: http://drupal.org/files/issues/xml-export-01.patch (14.19 KB) This updated patch adds "weight" metadata, which I forgot to capture in the previous patch. I'm not sure how much other metadata I should include.