[consulting] HTML Character Sanitization Solution

Travis Carden travis.carden at gmail.com
Fri Apr 23 20:11:28 UTC 2010


For correcting invalid (x)HTML—even Microsoft Word crap—I know of no better
solution than HTML Purifier <http://drupal.org/project/htmlpurifier>, which
actually does an outstanding job, in my experience. It can be a little too
restrictive for some use cases as it filters out JavaScript, OBJECT/EMBED,
and IFRAME (and you can't configure it not to, as far as I can tell). In
some such situations it can be helpfully paired with Video
Filter<http://drupal.org/project/video_filter>and Iframe
Filter <http://drupal.org/project/iframe_filter> or
insertFrame<http://drupal.org/project/insertFrame>.
(I don't have a solution for using it with JavaScript.) I suspect this
module would solve most people's issues—I *think* it will even strip
non-ASCII characters. Benjamin Finklea gives a good explanation of the
module's installation and use in his book Drupal 6 Search Engine
Optimization<http://amazon.com/o/ASIN/1847198228/ref=nosim/traviscardenc-20>from
Packt.

Unfortunately, I can't use HTML Purifier with my current client because it's
too restrictive for his needs. So what I'm looking for is something that
does nothing other than strip or (preferably) convert non-ASCII characters
to their equivalent HTML entities. e.g. convert “My problem,” he said, “is
simple—WYSIWYGs.” would become &ldquo;My problem,&rdquo; he said, &ldquo;is
simple&mdash;WYSIWYGs.&rdquo;. I have a sense that a good WYSIWYG should do
this, but I haven't had any success with FCKEditor's "paste from Word"
feature. Has anyone else? Does TinyMCE do any better?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.drupal.org/pipermail/consulting/attachments/20100423/be0340bc/attachment.html 


More information about the consulting mailing list