Ahhh... so by sanitizing you mean accepting non-fully standards
compliant feeds? If that's what you mean then definitely not. I totally
agree with Larry on this. Why waste my processing time on feeds that
are not worth a penny?<br>
<br>
Also, in my issue queue I haven't received one complaint about a feed
that would not parse. Not because I do any sanitization, but I think
the reason for this is because I parse the feed as an XML and look for
the main components, so even if it's not fully conforming
it will make do if it has the main components. If it's beyond hope in being called XML in the first place or just totally
messed up I don't waste the time and the coding effort that might
double or triple my module's code size on it. If that's a type of sanitization then good for me but it definitely does not affect my module's performance :-)<br>
<br>
Finally, I don't really care to make my module work with everyone and
everything. That's why I have clear PHP 5 and CURL requirements. In my opinion,
if a person is not prepared to get a good environment for his site or
let's something as crappy as a host dictate his platform (let's drop drupal because our host is using php 3.0 looool) then he's definitely not from my target audience. PHP 5 sites have
become as common as PHP 4 and very near in price. <br>
<br>
VPS hosting is
becomming cheaper by the day for anyone who's serious about a site.
Take a look for
yourselves. I use this VPS provider to sharpen my LAMP skills since this provider provides a clean slate
installation. I have root access and I can do anything I want there. The comparison between HTML and XML feeds is simply flawed IMHO.<br><br><a href="http://www.vpslink.com/">http://www.vpslink.com/</a><br><br><div><span class="gmail_quote">
On 6/19/07, <b class="gmail_sendername">Morbus Iff</b> <<a href="mailto:morbus@disobey.com">morbus@disobey.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> RSS is XML. The XML spec explicitly says that invalid files should be discarded, not guessed at the way HTML is. Trying to make sense of a broken RSS feed is explicitly contrary to the spec. So, er, why are we spending so much time trying to sanitize? If it doesn't parse correctly, report an error "this site's RSS feed is f*ed up, tell 'em to fix it". Am I missing something here?
<br><br>Did you forget Postel's Law? Or the fact that for a feed to be<br>considered "invalid" (as opposed to "not well-formed") would mean that<br>Drupal would have to have a validating document type parser?
<br><br><a href="http://www.w3.org/TR/REC-xml/#dt-valid">http://www.w3.org/TR/REC-xml/#dt-valid</a><br><a href="http://www.w3.org/TR/REC-xml/#dt-wellformed">http://www.w3.org/TR/REC-xml/#dt-wellformed</a><br><br>And, honestly, telling people that their RSS is malformed and "pls fix,
<br>k thanks" is about as viable as telling someone that their HTML isn't<br>well formed. It just ain't going to happen.<br><br>--<br>Morbus Iff ( be realistic. demand the impossible. )<br>Technical: <a href="http://www.oreillynet.com/pub/au/779">
http://www.oreillynet.com/pub/au/779</a><br>Culture: <a href="http://www.disobey.com/">http://www.disobey.com/</a> and <a href="http://www.gamegrene.com/">http://www.gamegrene.com/</a><br>aim: akaMorbus / skype: morbusiff / icq: 2927491 /
<a href="http://jabber.org">jabber.org</a>: morbus<br></blockquote></div><br>