[drupal-devel] [bug] Strange markup in book render functions

Steven drupal-devel at drupal.org
Sat May 7 18:40:25 UTC 2005

Issue status update for http://drupal.org/node/1898

 Project:      Drupal
 Version:      cvs
 Component:    book.module
 Category:     bug reports
 Priority:     critical
 Assigned to:  puregin
 Reported by:  kika
 Updated by:   Steven
 Status:       patch

Puregin: nope, the code correctly handles missing h numbers. If the book
page only uses h2, then that will be the first new level. If it uses h2
and h4, and it is inside a 3rd level book page for example (whose title
would be h3), they they get renumbered to respectively h4 and h5 (no
missing number).

>From a semantic point of view it is crazy not to use h# for book


Previous comments:

June 10, 2003 - 00:06 : kika

Snippet from book_render():

$output .= "<dt>". l($node->title, "node/view/$node->nid")
."</dt><dd>". book_body($node) ."<br /><br /></dd>";

Is using definition list here is really correct? This should be
replaced with appropiate div's or even better, funnel through theme()

Same goes for the function book_print():

 $output .= "$node->title";

  if ($node->body) {
    $output .= "<ul>". book_body($node) ."</ul>";

Why wrap body inside <ul>?


June 11, 2003 - 02:31 : al

Yeah, it's all nasty. I patched this ages ago, but no one seemed
interested. The patch now won't apply due to all the changes since

I'm on this - expect a new patch to hit my sandbox shortly.


June 11, 2003 - 08:03 : ax

please remove the superfluous <ul>s and <li> [1], too.

to solve the problem of invalid nested <hx> tags, maybe we should
change &lt;h$depth&gt; to &lt;h1 class="book-h$depth"&gt;? this way, we
could use <h2> - <h6> in book pages. alternatively, we should only use
<h5> and <h6> in book pages - we shouldn't have nestings deeper than 4
in printed books, then.

additionally, i think it would be a good idea to add id's to all book
headings so they can be jumped to via fragment identifiers
(url#fragment-identifier). i often wish this would be possible when
referencing the last paragraph of a long page of documentation ...
these id's should be of the form <hx id="foo" name="foo"> [2], with foo
probably being the node-id of the single book page.

adding id's to headings would also be a good idea for headings /
sections in single book nodes. for example, it would be nice if i could
link to list point 10 in the drupal installation instructions [3] via
something like url#10. these id's had to be added manually. the
question is: how can these id's be named to be unique within a
/printed/ book? probably <node_id>_<section_id> ...

[1] [ [drupal-devel] Drupal handbook formatting and heading tags |
http://lists.drupal.org/pipermail/drupal-devel/2003-June/025860.html ]
[2] [ XHTML Spec, HTML Compatibility Guidelines, Fragment Identifiers |
http://www.w3.org/TR/xhtml1/#C_8 ]
[3] http://drupal.org/node/view/260


June 15, 2003 - 16:23 : al

Fixed in latest CVS.
Uses all of Ax's suggestions:
 - <h1 class="depthX">
 - Fixed to use CSS rather than tables.
 - Changes link at top to standard "breadcrumb" style, rather than
hierarchy in tree view.
 - Removes odd and superfluous <ul> tags and the like.
 - Makes printed docs have proper <html> and <body> wrapping, etc.


May 5, 2005 - 09:20 : puregin

I'm re-opening this, because the problem persists in CVS.

1) The current CVS version ( of  book module still wraps
$node->body in an <ul> element in book_print_recurse().  This produces
non-conformant HTML.

I'm not sure why this was done in the first place - I'm guessing
someone wanted to insulate from having bad markup in a book page throw
off the enclosing page.

Getting rid of the offending UL tags is a one-line solution, but what
is the right thing to do here?

I'd like to see something very structural - wrapping the /entire/
output (including the generated sectional header) of the
book_print_recurse() wrapped in a <div> element.  This will properly
nest true structural sections, making the structure accessible to
applications via DOM.  It will also make it easier to use XSLT, for
example,  to transform into various XML based  export formats, for
example.  Any reason not to do so?

Also...  when we get the 'printer-friendly' version of a page, the
subtree rooted at that page is returned, and the root page is marked up
with a class="book-h1".   Is there any reason why we don't compute the
depth of the root page, and use that to generate the "in-place" header?
  In other words, if a section is "book-h3" relative to the top-level
book, shouldn't it be "book-h3" whenever it is viewed?

In this case, we could apply the same principal to generating the <div>
tags around each section.


May 6, 2005 - 00:58 : puregin

Attachment: http://drupal.org/files/issues/book_printer_friendly_div.patch (1.57 KB)

I am attaching a patch which fixes improper printer-friendly output, as

The patch fixes book_print and book_print_recurse() to insert a <div>
start tag before emitting the header, and to close this  after
subsections have been recursively generated.

The <div> tag has an id attribute of the form id="sect-123", where the
node's nid is 123, and a class attribute of form class="sect/n/" where
the depth of the section is /n/.

I'm marking this issue as critical, since the present HTML output plays
havoc with any attempts at sane PDF generation.

I will address the issue of where printer-friendly subsections should
be 'rooted' in a follow-up.



May 6, 2005 - 10:53 : Dries

Let's not abbreviate 'section' to 'sect'.  Also, we use '-' to separate
words in CSS names.  

- If the section ID uses the node ID, maybe we should use 'node-'
instead of 'sect-'?

- If the class uses depth, maybe we should use 'depth-' instead of

The same is true for your other patch where you suggest to use 'n'. 
Better to make that 'node-'.  (Feel free to merge both patches.)

Looks like the book.module generates a _lot_ of CSS IDs/classes. If the
introduction of these new IDs/classes allows us to remove some of the
existing IDs/classes, please do.


May 7, 2005 - 06:26 : Steven

Here's my idea.. why not use regular h tags for the book structure, and
renumber the header tags in the content, starting one level after the
current book depth?

Here's a function which does this:

function fix_up_headers($text, $level = 1) {
  // Find all header tags that are used (if any)
  if (preg_match_all('/<h[0-9]+/i', $text, $tags) == 0) {
    return $text;
  // Discard duplicates and sort them by number
  $tags = array_unique(array_map('strtolower', $tags[0]));
  // Renumber them and replace them
  $i = $level;
  foreach ($tags as $tag) {
    $from[] = '@'. preg_quote($tag) .'(?![0-9])@i';
    $from[] = '@'. preg_quote(str_replace('<', '</', $tag))
    $to[] = '<h'. $i;
    $to[] = '</h'. $i;
  $text = preg_replace($from, $to, $text);
  // Change level 7 headers and higher to divs.
  return preg_replace(array('!<h([7-9]|[1-9][0-9]+)!i',
'!</h([7-9]|[1-9][0-9]+)!i'), array('<div class="header-\1"', '</div'),

That way you can use header tags safely in the content (which is useful
in the context of a single page) without messing up the book output.


May 7, 2005 - 09:50 : puregin

Attachment: http://drupal.org/files/issues/02_book_printer_friendly_div.patch (1.78 KB)

The attached patch changes the div id to the form 'node-123'.  I was
concerned that a document fragment might be created with both a book
section and a non-book section from the same node, leading to
non-unique id's, but at this point it seems like an unlikely scenario.

I've changed 'class=sect1' to 'class=section-1'.  I was trying to be
lazy and use the DocBook section name convention to make XSLT
conversion a little easier, but it's really not saving any effort, in
the end, and this is clearer.

I think 'section' rather than 'depth' is appropriate - it describes
what the role of the div element is.

I've closed the other issue regarding id's in header elements, because
I've replaced the whole 'class="book-hx"' business in headers with
'class="book-header"'.  Since the level can now be selected by context
with respect to the enclosing div, separate classes for headers are not
required.  That is, we can style with CSS like the following:

.section-1 h1.book-header {
    font-weight: bold;
    font-size: 2.2em;
.section-2 h1.book-header {
    font-size: 2em;
.section-1 p  {
    text-align: justified;
.section-3 p  {
    margin-left: 3em;
    margin-right: 3em;

Note that the section-2 h1 will be bold, since the section-2 div is
nested within the section-1 div.  Also, all paragraphs regardless of
section will be justified; and section-3 paragraphs will be indented
left and right.

Because we can style /everything/ by section, not just headers, and
because the styling specifications respect the sectional hierarchy,
it's possible to create very elegant and sophisticated book pages with
very little CSS.  Even I can do it! :)  I'm definitely *not* a
designer, but I've played around a bit with this - have a look at
http://www.puregin.org/book/print/129 for an example.

I will attach patches for misc/print.css and misc/drupal.css
separately.  I don't understand why, but the present version (cvs) of
these files has all of the h1.book-hx selectors in drupal.css - the
printer-friendly page actually looks at print.css.  Am I missing

For the record, this approach seems to have been suggested by clairem
in http://drupal.org/node/8049 where I have a little rant about
headings in book pages.  Twice :)



May 7, 2005 - 10:10 : puregin

Steven - that's a neat idea, and very clever, clean code!  But it still
doesn't catch problems such as, for example,  an h5 inside of an h1
section with no intervening h2..h4 headers, or?

I'm inclined to be a bit suspicious of the approach, though - I rather
think it's 'doing magic behind the user's back'.   If people are going
to insist on putting headings inside of pages, it should be possible. 
And it would be odd for users to discover that what they typed isn't
what they get.


May 7, 2005 - 10:32 : puregin

Attachment: http://drupal.org/files/issues/print_css_div.patch (1.36 KB)

Here's the patch for print.css.  

There's a couple of commented-out lines - these show how to make
section headings 'run in' as part of the following paragraph (or other
inline element).  I am assuming that print.css is supposed to be
something of an example that people would use to experiment with? 
Please feel free to improve this in any way - as I said, I'm far from
being a graphic designer, and I'm also not really sure what the
intention of this file is.

More information about the drupal-devel mailing list