Jeremy Kerr wrote:
Currently, the filter module will mis-handle HTML tags that span more than one line. For example:
<a href="#"
link</a>
Will be filtered to:
<a href="#" >link</a>
This is caused by the attempt to match '.>' in filter_xss(). Since the . doesn't (by default) match a newline, the filter will try to escape the closing > on the <a> tag.
This results in aggregated feeds with badly-formed HTML.
This change removes the . match, so that a newline as the last character in a opening HTML tag is accepted. So that we always require at least one character in a tag, we also change the [^>]* to [^>]+
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
---
modules/filter/filter.module | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Index: drupal-6.2/modules/filter/filter.module =================================================================== --- drupal-6.2.orig/modules/filter/filter.module 2008-07-01 09:28:22.000000000 +1000 +++ drupal-6.2/modules/filter/filter.module 2008-07-22 09:24:48.000000000 +1000 @@ -983,7 +983,7 @@ function filter_xss($string, $allowed_ta ( <(?=[^a-zA-Z!/]) # a lone < | # or - <[^>]*.(>|$) # a string that starts with a <, up until the > or the end of the string + <[^>]+(>|$) # a string that starts with a <, up until the > or the end of the string | # or > # just a > )%x', '_filter_xss_split', $string);
Your work is respected and appreciated; please, though, can you create an issue in the drupal.org queue against the Drupal project (category filter.module) and attach this patch there? Then set it patch (code needs review).