[drupal-devel] [bug] Strange markup in book render functions

puregin drupal-devel at drupal.org
Sat May 7 07:50:36 UTC 2005


Issue status update for http://drupal.org/node/1898

 Project:      Drupal
 Version:      cvs
 Component:    book.module
 Category:     bug reports
 Priority:     critical
 Assigned to:  puregin
 Reported by:  kika
 Updated by:   puregin
 Status:       patch
 Attachment:   http://drupal.org/files/issues/02_book_printer_friendly_div.patch (1.78 KB)

The attached patch changes the div id to the form 'node-123'.  I was
concerned that a document fragment might be created with both a book
section and a non-book section from the same node, leading to
non-unique id's, but at this point it seems like an unlikely scenario.


I've changed 'class=sect1' to 'class=section-1'.  I was trying to be
lazy and use the DocBook section name convention to make XSLT
conversion a little easier, but it's really not saving any effort, in
the end, and this is clearer.


I think 'section' rather than 'depth' is appropriate - it describes
what the role of the div element is.


I've closed the other issue regarding id's in header elements, because
I've replaced the whole 'class="book-hx"' business in headers with
'class="book-header"'.  Since the level can now be selected by context
with respect to the enclosing div, separate classes for headers are not
required.  That is, we can style with CSS like the following:


.section-1 h1.book-header {
    font-weight: bold;
    font-size: 2.2em;
}
.section-2 h1.book-header {
    font-size: 2em;
}
.section-1 p  {
    text-align: justified;
}
.section-3 p  {
    margin-left: 3em;
    margin-right: 3em;
}


Note that the section-2 h1 will be bold, since the section-2 div is
nested within the section-1 div.  Also, all paragraphs regardless of
section will be justified; and section-3 paragraphs will be indented
left and right.


Because we can style /everything/ by section, not just headers, and
because the styling specifications respect the sectional hierarchy,
it's possible to create very elegant and sophisticated book pages with
very little CSS.  Even I can do it! :)  I'm definitely *not* a
designer, but I've played around a bit with this - have a look at
http://www.puregin.org/book/print/129 for an example.


I will attach patches for misc/print.css and misc/drupal.css
separately.  I don't understand why, but the present version (cvs) of
these files has all of the h1.book-hx selectors in drupal.css - the
printer-friendly page actually looks at print.css.  Am I missing
something?


For the record, this approach seems to have been suggested by clairem
in http://drupal.org/node/8049 where I have a little rant about
headings in book pages.  Twice :)


Djun




puregin



Previous comments:
------------------------------------------------------------------------

June 9, 2003 - 14:06 : kika

Snippet from book_render():


$output .= "<dt>". l($node->title, "node/view/$node->nid")
."</dt><dd>". book_body($node) ."<br /><br /></dd>";


Is using definition list here is really correct? This should be
replaced with appropiate div's or even better, funnel through theme()


Same goes for the function book_print():


 $output .= "$node->title";


  if ($node->body) {
    $output .= "<ul>". book_body($node) ."</ul>";
  }


Why wrap body inside <ul>?




------------------------------------------------------------------------

June 10, 2003 - 16:31 : al

Yeah, it's all nasty. I patched this ages ago, but no one seemed
interested. The patch now won't apply due to all the changes since
then. 


I'm on this - expect a new patch to hit my sandbox shortly.




------------------------------------------------------------------------

June 10, 2003 - 22:03 : ax

please remove the superfluous <ul>s and <li> [1], too.


to solve the problem of invalid nested <hx> tags, maybe we should
change &lt;h$depth&gt; to &lt;h1 class="book-h$depth"&gt;? this way, we
could use <h2> - <h6> in book pages. alternatively, we should only use
<h5> and <h6> in book pages - we shouldn't have nestings deeper than 4
in printed books, then.


additionally, i think it would be a good idea to add id's to all book
headings so they can be jumped to via fragment identifiers
(url#fragment-identifier). i often wish this would be possible when
referencing the last paragraph of a long page of documentation ...
these id's should be of the form <hx id="foo" name="foo"> [2], with foo
probably being the node-id of the single book page.


adding id's to headings would also be a good idea for headings /
sections in single book nodes. for example, it would be nice if i could
link to list point 10 in the drupal installation instructions [3] via
something like url#10. these id's had to be added manually. the
question is: how can these id's be named to be unique within a
/printed/ book? probably <node_id>_<section_id> ...


[1] [ [drupal-devel] Drupal handbook formatting and heading tags |
http://lists.drupal.org/pipermail/drupal-devel/2003-June/025860.html ]
[2] [ XHTML Spec, HTML Compatibility Guidelines, Fragment Identifiers |
http://www.w3.org/TR/xhtml1/#C_8 ]
[3] http://drupal.org/node/view/260




------------------------------------------------------------------------

June 15, 2003 - 06:23 : al

Fixed in latest CVS.
Uses all of Ax's suggestions:
 - <h1 class="depthX">
 - Fixed to use CSS rather than tables.
 - Changes link at top to standard "breadcrumb" style, rather than
hierarchy in tree view.
 - Removes odd and superfluous <ul> tags and the like.
 - Makes printed docs have proper <html> and <body> wrapping, etc.




------------------------------------------------------------------------

May 4, 2005 - 23:20 : puregin

I'm re-opening this, because the problem persists in CVS.


1) The current CVS version (1.288.2.4) of  book module still wraps
$node->body in an <ul> element in book_print_recurse().  This produces
non-conformant HTML.


I'm not sure why this was done in the first place - I'm guessing
someone wanted to insulate from having bad markup in a book page throw
off the enclosing page.


Getting rid of the offending UL tags is a one-line solution, but what
is the right thing to do here?


I'd like to see something very structural - wrapping the /entire/
output (including the generated sectional header) of the
book_print_recurse() wrapped in a <div> element.  This will properly
nest true structural sections, making the structure accessible to
applications via DOM.  It will also make it easier to use XSLT, for
example,  to transform into various XML based  export formats, for
example.  Any reason not to do so?


Also...  when we get the 'printer-friendly' version of a page, the
subtree rooted at that page is returned, and the root page is marked up
with a class="book-h1".   Is there any reason why we don't compute the
depth of the root page, and use that to generate the "in-place" header?
  In other words, if a section is "book-h3" relative to the top-level
book, shouldn't it be "book-h3" whenever it is viewed?


In this case, we could apply the same principal to generating the <div>
tags around each section.




------------------------------------------------------------------------

May 5, 2005 - 14:58 : puregin

Attachment: http://drupal.org/files/issues/book_printer_friendly_div.patch (1.57 KB)

I am attaching a patch which fixes improper printer-friendly output, as
discussed.


The patch fixes book_print and book_print_recurse() to insert a <div>
start tag before emitting the header, and to close this  after
subsections have been recursively generated.


The <div> tag has an id attribute of the form id="sect-123", where the
node's nid is 123, and a class attribute of form class="sect/n/" where
the depth of the section is /n/.


I'm marking this issue as critical, since the present HTML output plays
havoc with any attempts at sane PDF generation.


I will address the issue of where printer-friendly subsections should
be 'rooted' in a follow-up.


Djun




------------------------------------------------------------------------

May 6, 2005 - 00:53 : Dries

Let's not abbreviate 'section' to 'sect'.  Also, we use '-' to separate
words in CSS names.  


- If the section ID uses the node ID, maybe we should use 'node-'
instead of 'sect-'?


- If the class uses depth, maybe we should use 'depth-' instead of
'sect'?


The same is true for your other patch where you suggest to use 'n'. 
Better to make that 'node-'.  (Feel free to merge both patches.)


Looks like the book.module generates a _lot_ of CSS IDs/classes. If the
introduction of these new IDs/classes allows us to remove some of the
existing IDs/classes, please do.




------------------------------------------------------------------------

May 6, 2005 - 20:26 : Steven

Here's my idea.. why not use regular h tags for the book structure, and
renumber the header tags in the content, starting one level after the
current book depth?


Here's a function which does this:

<?php
function fix_up_headers($text, $level = 1) {
  // Find all header tags that are used (if any)
  if (preg_match_all('/<h[0-9]+/i', $text, $tags) == 0) {
    return $text;
  }
  // Discard duplicates and sort them by number
  $tags = array_unique(array_map('strtolower', $tags[0]));
  natsort($tags);
  // Renumber them and replace them
  $i = $level;
  foreach ($tags as $tag) {
    $from[] = '@'. preg_quote($tag) .'(?![0-9])@i';
    $from[] = '@'. preg_quote(str_replace('<', '</', $tag))
.'(?![0-9])@i';
    $to[] = '<h'. $i;
    $to[] = '</h'. $i;
    $i++;
  }
  $text = preg_replace($from, $to, $text);
  // Change level 7 headers and higher to divs.
  return preg_replace(array('!<h([7-9]|[1-9][0-9]+)!i',
'!</h([7-9]|[1-9][0-9]+)!i'), array('<div class="header-\1"', '</div'),
$text);
}
?>




That way you can use header tags safely in the content (which is useful
in the context of a single page) without messing up the book output.







More information about the drupal-devel mailing list