Listen, sanitizing as in protecting against XSS and SQL injection is a must. But others seem to include parsing not fully compliant XML feeds and not fully standards compliant feeds as part of sanitization. In my opinion the second is not sanitization and no aggregator needs to waste the code and time on trying to handle non-XML or non-standards compliant feeds.
<br><br>I would be very surprised if I found that SimplePie is wasting 11 seconds out of 12 in preventing XSS or SQL injection attacks alone. But hey, what do I know about SimplePie. Does anyone know what SimplePie actually does within these 11 seconds?
<br><br>I use CURL rather than the drupal function because drupal uses fsockets, fsockets are known to have issues when retrieving from HTTP 1.0 as opposed to HTTP 1.1, this is solved by sending specific headers, these headers are ignored on some Linux systems and that may increase the response time by as much as 5x what CURL would take, normally they would be similar in speed although CURL is slightly faster. In specific situations like firewall presence or failure in DNS resolution fsockets may stall for a few minutes. I read about these issues in a number of articles and I also found them listed in the comments under fsocketopen. CURL just saved me the hassle and probably tens of issues :-)
<br><br><a href="http://www.php.net/manual/en/function.fsockopen.php">http://www.php.net/manual/en/function.fsockopen.php</a><br><br>Finally, with CURL I can support both HTTP and FTP URLs weather they are authenticated with a username and password or not. Only con to using CURL is that it has to be installed on the server machine and enabled in PHP (weather as a module or compiled).
<br><br>> Aside:<br>> I was just looking at aggregation module and I like how it keeps XML<br>> parsing and parsing into your internal data structure apart.<br>> How did you decide on the data structure that you use in
<br>> aggregation?<br><br>Going into details will be too long, but there's an API between my module and the feed-specific handler (where it receives the ready to parse SimpleXML object from my module) and another API between the feed-specific handler and my module (where it passes the parsed data). My module chooses which feed handler to use (RSS,ATOM,etc) based on the user's input (specifying a term) when he creates a feed node. Thus to parse a new XML format all you have to do is add a term to the aggregation vocabulary and add a specific feed-handler file that conforms to the API calls. You can check the existing feed handlers to see how easy it is to do that.