[development] PHP 5 > aggregator.module rewrite to XML API?

Scott Trudeau strudeau at umich.edu
Wed Jun 20 17:09:17 UTC 2007


Re: curl vs. drupal_http_request

In addition to handling FTP urls and handle http authenticaton, curl
can optionally follow redirects and handle things like content
disposition headers (with custom code), which I've found to be
important when dealing with enclosures.  Not sure how flexible/capable
drupal_http_request is on those kinds of issues.

Scott

On 6/20/07, Ashraf Amayreh <mistknight at gmail.com> wrote:
> > Are you saying the header Connection: Close is ignored?
> >
> > Any reason why Drupal should not use HTTP/1.1?
>
> Examining the drupal_http_request I found that it actually doesn't use HTTP
> 1.1 in the first place, so I guess there's no problem in using it at all.
> But I would still like to maintain the ability to transparently accept feeds
> from HTTP or FTP as well as providing the users the option to access
> authenticated URLs. I don't know how widely used these features are, but I'd
> hate to remove a feature that could help a user out.
>
> Seems we were guilty in assuming what SimplePie did during these 11 seconds.
> Although I still think it's going about it the wrong way. 11 seconds is
> suicide. I sanitize against the extracted data, rather than the feed string
> as a whole. That's what I presume SimplePie is doing. I wish I could check
> it out for myself but my sleep indicators are overloaded.
>
> Morbus' suggestion to pass along the string as whole sounds logical, I'll
> see what I can do about that. Although I really had assumed that aggregation
> happens from XMLs only so the module would need a considerable amount of
> change to accommodate non-XML strings. I'll study the option and see what I
> can do. Anyone care to give me a patch for my next birthday? :-P
>
>
> AA
>
> On 6/20/07, Morbus Iff <morbus at disobey.com > wrote:
> > > opinion the second is not sanitization and no aggregator needs to waste
> > > the code and time on trying to handle non-XML or non-standards compliant
> >
> > It depends entirely on your definition of "aggregator". In your module,
> > you have only one parser, really - PHP's SimpleXML (or whatever it's
> > called) that then sends the loaded data structure to the smaller "do
> > things with it" (ie., RSS20.inc, etc.) subparsers. However, I'd think
> > that it'd be far more flexible to send the raw strings around /as well/
> > - then one could support, for example, non-XML documents (or, in my
> > particular case, I could write scrapers for sites that don't support
> > feeds [or feeds that contain useful data]) so that I'd be able to hook
> > into the generic aggregating process. Aggregation != just XML, IMO.
> >
> > I'd love, for example, to be able to add a "feed" that points to (pff,
> > making crap outta my ass here) some comic site's "latest comic" HTML,
> > choose a custom-made parser that expects that HTML, and return the same
> > data structure that the aggregation API expects as legit. This /is/
> > aggregation - pulling disparate sources together.
> >
> > > I would be very surprised if I found that SimplePie is wasting 11
> > > seconds out of 12 in preventing XSS or SQL injection attacks alone. But
> > > hey, what do I know about SimplePie. Does anyone know what SimplePie
> > > actually does within these 11 seconds?
> >
> > SimplePie's set_stupidly_fast is a wrapper around:
> >
> >    $this->enable_order_by_date(false);
> >    $this->remove_div(false);
> >    $this->strip_comments(false);
> >    $this->strip_htmltags(false);
> >    $this->strip_attributes(false);
> >    $this->set_image_handler(false);
> >
> > None of those are "fix broken XML". I reran the initial test like so:
> >
> >    $feed->set_stupidly_fast(TRUE);
> >    $feed->enable_order_by_date(TRUE);
> >
> > i.e. first shutting everything off, then enabling one command:
> >
> >    $feed->enable_order_by_date(TRUE)       2 seconds
> >    $feed->remove_div(TRUE)                 1 second
> >    $feed->strip_comments(TRUE);            2 seconds
> >    $feed->strip_htmltags(TRUE);            2 seconds
> >    $feed->strip_attributes(TRUE);          2 seconds
> >    $feed->set_image_handler(TRUE);         1 second
> >
> > --
> > Morbus Iff ( if god is my witness, god must be blind )
> > Technical: http://www.oreillynet.com/pub/au/779
> > Culture: http://www.disobey.com/ and http://www.gamegrene.com/
> > aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus
> >
>
>


More information about the development mailing list