[consulting] HTML Character Sanitization Solution

Ben West westbywest at gmail.com
Fri Apr 23 20:19:45 UTC 2010


If you use the HTML purifier library, do be please take note of older
versions' unfortunate habit of stripping out name property from anchor
tags, i.e. the 'name="blahblah"' bit.

Make sure to get v3.3.0 or newer of the library.

On Fri, Apr 23, 2010 at 3:11 PM, Travis Carden <travis.carden at gmail.com> wrote:
> For correcting invalid (x)HTML—even Microsoft Word crap—I know of no better
> solution than HTML Purifier, which actually does an outstanding job, in my
> experience. It can be a little too restrictive for some use cases as it
> filters out JavaScript, OBJECT/EMBED, and IFRAME (and you can't configure it
> not to, as far as I can tell). In some such situations it can be helpfully
> paired with Video Filter and Iframe Filter or insertFrame. (I don't have a
> solution for using it with JavaScript.) I suspect this module would solve
> most people's issues—I think it will even strip non-ASCII characters.
> Benjamin Finklea gives a good explanation of the module's installation and
> use in his book Drupal 6 Search Engine Optimization from Packt.
>
> Unfortunately, I can't use HTML Purifier with my current client because it's
> too restrictive for his needs. So what I'm looking for is something that
> does nothing other than strip or (preferably) convert non-ASCII characters
> to their equivalent HTML entities. e.g. convert “My problem,” he said, “is
> simple—WYSIWYGs.” would become &ldquo;My problem,&rdquo; he said, &ldquo;is
> simple&mdash;WYSIWYGs.&rdquo;. I have a sense that a good WYSIWYG should do
> this, but I haven't had any success with FCKEditor's "paste from Word"
> feature. Has anyone else? Does TinyMCE do any better?
>
> _______________________________________________
> consulting mailing list
> consulting at drupal.org
> http://lists.drupal.org/mailman/listinfo/consulting
>
>



-- 
Ben West
westbywest at gmail.com


More information about the consulting mailing list