[drupal-devel] [bug] Aggregator: Corrupted title based on item description with HTML

Morbus Iff drupal-devel at drupal.org
Thu Mar 31 22:05:20 UTC 2005


Issue status update for http://drupal.org/node/19573

 Project:      Drupal
-Version:      4.5.2
+Version:      cvs
 Component:    aggregator.module
 Category:     bug reports
 Priority:     normal
 Assigned to:  Anonymous
 Reported by:  edhel
 Updated by:   Morbus Iff
 Status:       active

Wouldn't you want to remove the HTML from the description FIRST, and
then get the first 40 characters from the remainder? The proposed code
could still return no title, especially if the first 40 characters of
the description are something like "[a
href="http://www.disobey.com/"][strong]an example of an empty title
based on a 40 character trunc before HTML removal[/strong][/a]". As for
the i18n stuff, I know nothing about it, so someone else will have to
address the change to ereg instead of preg.


Morbus Iff



Previous comments:
------------------------------------------------------------------------

March 27, 2005 - 23:59 : edhel

It's problem of aggregator.module in Drupal 4.5.x and 4.6 also. I have
detected this problem when "aggregate" posts from LiveJournal.com which
have blank titles and html in description at the same time.
In aggregator.module in aggregator_parse_feed() at line 501 (Drupal
4.5.2) there is such code:

<?php
if ($item['TITLE']) {
      $title = $item['TITLE'];
    }
    else {
      $title = preg_replace('/^(.*)[^\w;&].*?$/', "\\1",
truncate_utf8($item['DESCRIPTION'], 40));
    }
?>


I suppose that it is incorrect preg_replace call as since:
1) it doesn't cut html tags
2) it may incorrectly work with national chars (i.e. \w): maybe
ereg_replace is better solution?
For correct processing I changed this code to such:

<?php
if ($item['TITLE']) {
      $title = $item['TITLE'];
    }
    else {
//      $title = preg_replace('/^(.*)[^\w;&].*?$/', "\\1",
truncate_utf8($item['DESCRIPTION'], 40));
        $title = ereg_replace('<.*?(>|$)', '',
truncate_utf8($item['DESCRIPTION'], 40));
        $title = ereg_replace('/^(.*?)[^[:alnum:];&].*?$/',
"\\1", $title) . "...";
    }
?>







More information about the drupal-devel mailing list