<font face="verdana,sans-serif">For correcting invalid (x)HTML—even Microsoft Word crap—I know of no better solution than <a href="http://drupal.org/project/htmlpurifier">HTML Purifier</a>, which actually does an outstanding job, in my experience. It can be a little too restrictive for some use cases as it filters out JavaScript, OBJECT/EMBED, and IFRAME (and you can't configure it not to, as far as I can tell). In some such situations it can be helpfully paired with <a href="http://drupal.org/project/video_filter">Video Filter</a> and <a href="http://drupal.org/project/iframe_filter">Iframe Filter</a> or <a href="http://drupal.org/project/insertFrame">insertFrame</a>. (I don't have a solution for using it with JavaScript.) I suspect this module would solve most people's issues—I <i>think</i> it will even strip non-ASCII characters. Benjamin Finklea gives a good explanation of the module's installation and use in his book <a href="http://amazon.com/o/ASIN/1847198228/ref=nosim/traviscardenc-20">Drupal 6 Search Engine Optimization</a> from Packt.<br>
<br>Unfortunately, I can't use HTML Purifier with my current client because it's too restrictive for his needs. So what I'm looking for is something that does nothing other than strip or (preferably) convert non-ASCII characters to their equivalent HTML entities. e.g. convert “My problem,” he said, “is simple—WYSIWYGs.” would become &ldquo;My problem,&rdquo; he said, &ldquo;is simple&mdash;WYSIWYGs.&rdquo;. I have a sense that a good WYSIWYG should do this, but I haven't had any success with FCKEditor's "paste from Word" feature. Has anyone else? Does TinyMCE do any better?<br>
</font>