[development] PHP 5 > aggregator.module rewrite to XML API?

Ashraf Amayreh mistknight at gmail.com
Wed Jun 20 00:23:12 UTC 2007


Listen, sanitizing as in protecting against XSS and SQL injection is a must.
But others seem to include parsing not fully compliant XML feeds and not
fully standards compliant feeds as part of sanitization. In my opinion the
second is not sanitization and no aggregator needs to waste the code and
time on trying to handle non-XML or non-standards compliant feeds.

I would be very surprised if I found that SimplePie is wasting 11 seconds
out of 12 in preventing XSS or SQL injection attacks alone. But hey, what do
I know about SimplePie. Does anyone know what SimplePie actually does within
these 11 seconds?

I use CURL rather than the drupal function because drupal uses fsockets,
fsockets are known to have issues when retrieving from HTTP 1.0 as opposed
to HTTP 1.1, this is solved by sending specific headers, these headers are
ignored on some Linux systems and that may increase the response time by as
much as 5x what CURL would take, normally they would be similar in speed
although CURL is slightly faster. In specific situations like firewall
presence or failure in DNS resolution fsockets may stall for a few minutes.
I read about these issues in a number of articles and I also found them
listed in the comments under fsocketopen. CURL just saved me the hassle and
probably tens of issues :-)

http://www.php.net/manual/en/function.fsockopen.php

Finally, with CURL I can support both HTTP and FTP URLs weather they are
authenticated with a username and password or not. Only con to using CURL is
that it has to be installed on the server machine and enabled in PHP
(weather as a module or compiled).

> Aside:
> I was just looking at aggregation module and I like how it keeps XML
> parsing and parsing into your internal data structure apart.
> How did you decide on the data structure that you use in
> aggregation?

Going into details will be too long, but there's an API between my module
and the feed-specific handler (where it receives the ready to parse
SimpleXML object from my module) and another API between the feed-specific
handler and my module (where it passes the parsed data). My module chooses
which feed handler to use (RSS,ATOM,etc) based on the user's input
(specifying a term) when he creates a feed node. Thus to parse a new XML
format all you have to do is add a term to the aggregation vocabulary and
add a specific feed-handler file that conforms to the API calls. You can
check the existing feed handlers to see how easy it is to do that.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.drupal.org/pipermail/development/attachments/20070620/6993bb63/attachment.htm 


More information about the development mailing list