[development] PHP 5 > aggregator.module rewrite to XML API?

Ashraf Amayreh mistknight at gmail.com
Wed Jun 20 08:19:49 UTC 2007


> Are you saying the header Connection: Close is ignored?
>
> Any reason why Drupal should not use HTTP/1.1?

Examining the drupal_http_request I found that it actually doesn't use HTTP
1.1 in the first place, so I guess there's no problem in using it at all.
But I would still like to maintain the ability to transparently accept feeds
from HTTP or FTP as well as providing the users the option to access
authenticated URLs. I don't know how widely used these features are, but I'd
hate to remove a feature that could help a user out.

Seems we were guilty in assuming what SimplePie did during these 11 seconds.
Although I still think it's going about it the wrong way. 11 seconds is
suicide. I sanitize against the extracted data, rather than the feed string
as a whole. That's what I presume SimplePie is doing. I wish I could check
it out for myself but my sleep indicators are overloaded.

Morbus' suggestion to pass along the string as whole sounds logical, I'll
see what I can do about that. Although I really had assumed that aggregation
happens from XMLs only so the module would need a considerable amount of
change to accommodate non-XML strings. I'll study the option and see what I
can do. Anyone care to give me a patch for my next birthday? :-P


AA

On 6/20/07, Morbus Iff <morbus at disobey.com> wrote:
>
> > opinion the second is not sanitization and no aggregator needs to waste
> > the code and time on trying to handle non-XML or non-standards compliant
>
> It depends entirely on your definition of "aggregator". In your module,
> you have only one parser, really - PHP's SimpleXML (or whatever it's
> called) that then sends the loaded data structure to the smaller "do
> things with it" (ie., RSS20.inc, etc.) subparsers. However, I'd think
> that it'd be far more flexible to send the raw strings around /as well/
> - then one could support, for example, non-XML documents (or, in my
> particular case, I could write scrapers for sites that don't support
> feeds [or feeds that contain useful data]) so that I'd be able to hook
> into the generic aggregating process. Aggregation != just XML, IMO.
>
> I'd love, for example, to be able to add a "feed" that points to (pff,
> making crap outta my ass here) some comic site's "latest comic" HTML,
> choose a custom-made parser that expects that HTML, and return the same
> data structure that the aggregation API expects as legit. This /is/
> aggregation - pulling disparate sources together.
>
> > I would be very surprised if I found that SimplePie is wasting 11
> > seconds out of 12 in preventing XSS or SQL injection attacks alone. But
> > hey, what do I know about SimplePie. Does anyone know what SimplePie
> > actually does within these 11 seconds?
>
> SimplePie's set_stupidly_fast is a wrapper around:
>
>    $this->enable_order_by_date(false);
>    $this->remove_div(false);
>    $this->strip_comments(false);
>    $this->strip_htmltags(false);
>    $this->strip_attributes(false);
>    $this->set_image_handler(false);
>
> None of those are "fix broken XML". I reran the initial test like so:
>
>    $feed->set_stupidly_fast(TRUE);
>    $feed->enable_order_by_date(TRUE);
>
> i.e. first shutting everything off, then enabling one command:
>
>    $feed->enable_order_by_date(TRUE)       2 seconds
>    $feed->remove_div(TRUE)                 1 second
>    $feed->strip_comments(TRUE);            2 seconds
>    $feed->strip_htmltags(TRUE);            2 seconds
>    $feed->strip_attributes(TRUE);          2 seconds
>    $feed->set_image_handler(TRUE);         1 second
>
> --
> Morbus Iff ( if god is my witness, god must be blind )
> Technical: http://www.oreillynet.com/pub/au/779
> Culture: http://www.disobey.com/ and http://www.gamegrene.com/
> aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.drupal.org/pipermail/development/attachments/20070620/288be319/attachment.htm 


More information about the development mailing list