[drupal-devel] [feature] export book as xml for formatting
puregin
drupal-devel at drupal.org
Wed Jun 1 10:33:14 UTC 2005
Issue status update for http://drupal.org/node/1482
Project: Drupal
Version: cvs
Component: book.module
Category: feature requests
Priority: normal
Assigned to: Anonymous
Reported by: BenEng
Updated by: puregin
-Status: active
+Status: patch
Attachment: http://drupal.org/files/issues/xml-export.patch (14.06 KB)
This patch enables export of books as XML documents.
The XML is DocBook "at the level of structure", but
node contents are wrapped as CDATA, since we
can't be sure that the contents are valid XML.
Several other bugs/feature requests are also
addressed with this patch:
- Fixes bugs
http://drupal.org/node/1898
http://drupal.org/node/1482
http://drupal.org/node/8049
http://drupal.org/node/1899
Should go a long way towards implementing feature request
http://drupal.org/node/2062
It should also be easy to extend this to produce OPML,
for example.
- Adds about 170 lines, of which more than 100 are comments
- Added doxygen comments
- Made doxygen comment format consistent; fixed minor grammatical
slips
- A proper Doctype and more informative HTML element is generated
for printer-friendly HTML output.
- Refactored book_print() to use book_recurse().
- Refactored book_recurse(). Applies 'visitor' callback functions to
nodes
during weight/title order tree-traversal. The parameterized
visitor callbacks can be used to generate different kinds of output.
There are many other kinds of operations on books which can be
implemented
by writing a pre-node/post-node pair of callback functions:
word-count/
statistics gathering, comparison, copying, search and replace...
- Introduced book_export() which uses book_recurse() to generate
DocBook-like XML to export book contents in a structured form.
An md5 hash is computed for each node to help import code to
decide if a node needs to be updated or not.
puregin
Previous comments:
------------------------------------------------------------------------
April 5, 2003 - 14:38 : BenEng
Export an entire collaborative book to xml (docbook) so that it can be
formatted using xslt (e.g., to rtf or pdf) and printed.
It would also be nice to be able to perform the reverse. That is to
import a collaborative book from an xml (docbook) file that is
uploaded.
------------------------------------------------------------------------
June 2, 2004 - 17:10 : moshe weitzman
can someone suggest an xml schema for this? i think we need a general
xml schema for nodes. after that, this becomes a simple matter of
nesting node elements (I think)
------------------------------------------------------------------------
February 1, 2005 - 02:42 : Teto
Hi,
Is there any news about a such feature ?
All i've found about a docbook schema is here :
http://docbook.sourceforge.net/projects/schema/
It seems there isn't much in the docbook cvs about that.
Teto.
------------------------------------------------------------------------
May 14, 2005 - 02:11 : puregin
Here's a list of the Book publishing DTDs I know about:
NameNotesRef
ISO 12083:1998//DTD Book//EN - this includes ISO 12093:1993//DTD
Mathematics//EN
Committee standard - very general. Used e.g. by University of
California Press.
www.xmlxperts.com/bookdtd.htm [1]
DocBook
Applications - widely used by Computer book publishers, e.g. O'Reilly.
Good support.
docbook.org [2]
TEI/TEI-Lite
Applications - scholarly/historical/literary documents
www.tei-c.org [3]
MIL-STD-38784 (CALS)
Applications - Military/Govt/Enterprise publishing
http://xml.coverpages.org/mil-std-38784-a1-dtd.txt [4]
I'd highly recommend DocBook as a useable, technically focused, XML DTD
with strong toolset support.
[1] http://www.xmlxperts.com/bookdtd.htm
[2] http://docbook.org
[3] http://www.tei-c.org/
[4] http://xml.coverpages.org/mil-std-38784-a1-dtd.txt
------------------------------------------------------------------------
May 18, 2005 - 01:37 : puregin
I'd suggest we start with something very simple.
The patch which I submitted for http://drupal.org/node/1898 wraps each
node in <div> tags, with a level, and a node id attribute, for printer
friendly output.
We can't rely in general on the contents of a node being XHTML, even if
we force output through an XHTML validator such as tidy. So our best
bet is to encode the entire contents of a node as CDATA. This gives us
hierarchy, and encapsulated contents (of any kind - later this could
also be other kinds of data or markup)
This output will be valid XML, with a pretty simple DTD. It is easy to
take such a file and write simple XSLT based scripts on the the client
side to explode this file into a directory tree of HTML, a single HTML
file, or many other formats.
Importing is trickier. It's relatively easy to import an exported
file, and update the nodes of the book according to the hierarchy
defined by the sectional <div> elements. Importing needs to take care
of structure which has changed - child nodes added, deleted, or moved.
It would also be nice to have some client-side scripts to import other
formats into this nested sectional <div> based format - for example, to
take a directory tree of HTML fragments, and make this into an
importable file.
More information about the drupal-devel
mailing list