RSS is XML. The XML spec explicitly says that invalid files should be discarded, not guessed at the way HTML is. Trying to make sense of a broken RSS feed is explicitly contrary to the spec. So, er, why are we spending so much time trying to sanitize? If it doesn't parse correctly, report an error "this site's RSS feed is f*ed up, tell 'em to fix it". Am I missing something here?
Did you forget Postel's Law? Or the fact that for a feed to be considered "invalid" (as opposed to "not well-formed") would mean that Drupal would have to have a validating document type parser? http://www.w3.org/TR/REC-xml/#dt-valid http://www.w3.org/TR/REC-xml/#dt-wellformed And, honestly, telling people that their RSS is malformed and "pls fix, k thanks" is about as viable as telling someone that their HTML isn't well formed. It just ain't going to happen. -- Morbus Iff ( be realistic. demand the impossible. ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus