Ahhh... so by sanitizing you mean accepting non-fully standards compliant feeds? If that's what you mean then definitely not. I totally agree with Larry on this. Why waste my processing time on feeds that are not worth a penny?

Also, in my issue queue I haven't received one complaint about a feed that would not parse. Not because I do any sanitization, but I think the reason for this is because I parse the feed as an XML and look for the main components, so even if it's not fully conforming it will make do if it has the main components. If it's beyond hope in being called XML in the first place or just totally messed up I don't waste the time and the coding effort that might double or triple my module's code size on it. If that's a type of sanitization then good for me but it definitely does not affect my module's performance :-)

Finally, I don't really care to make my module work with everyone and everything. That's why I have clear PHP 5 and CURL requirements. In my opinion, if a person is not prepared to get a good environment for his site or let's something as crappy as a host dictate his platform (let's drop drupal because our host is using php 3.0 looool) then he's definitely not from my target audience. PHP 5 sites have become as common as PHP 4 and very near in price.

VPS hosting is becomming cheaper by the day for anyone who's serious about a site. Take a look for yourselves. I use this VPS provider to sharpen my LAMP skills since this provider provides a clean slate installation. I have root access and I can do anything I want there. The comparison between HTML and XML feeds is simply flawed IMHO.

http://www.vpslink.com/

On 6/19/07, Morbus Iff <morbus@disobey.com> wrote:
> RSS is XML.  The XML spec explicitly says that invalid files should be discarded, not guessed at the way HTML is.  Trying to make sense of a broken RSS feed is explicitly contrary to the spec.  So, er, why are we spending so much time trying to sanitize?  If it doesn't parse correctly, report an error "this site's RSS feed is f*ed up, tell 'em to fix it".  Am I missing something here?

Did you forget Postel's Law? Or the fact that for a feed to be
considered "invalid" (as opposed to "not well-formed") would mean that
Drupal would have to have a validating document type parser?

http://www.w3.org/TR/REC-xml/#dt-valid
http://www.w3.org/TR/REC-xml/#dt-wellformed

And, honestly, telling people that their RSS is malformed and "pls fix,
k thanks" is about as viable as telling someone that their HTML isn't
well formed. It just ain't going to happen.

--
Morbus Iff ( be realistic. demand the impossible. )
Technical: http://www.oreillynet.com/pub/au/779
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus