[development] PHP 5 > aggregator.module rewrite to XML API?
seanr at ngpsoftware.com
Wed Jun 20 17:10:52 UTC 2007
Curl is frequently not available. I've had to have it installed a few
times, but a LOT of people won't even have that as an option.
Scott Trudeau wrote:
> Re: curl vs. drupal_http_request
> In addition to handling FTP urls and handle http authenticaton, curl
> can optionally follow redirects and handle things like content
> disposition headers (with custom code), which I've found to be
> important when dealing with enclosures. Not sure how flexible/capable
> drupal_http_request is on those kinds of issues.
> On 6/20/07, Ashraf Amayreh <mistknight at gmail.com> wrote:
>> > Are you saying the header Connection: Close is ignored?
>> > Any reason why Drupal should not use HTTP/1.1?
>> Examining the drupal_http_request I found that it actually doesn't use
>> 1.1 in the first place, so I guess there's no problem in using it at all.
>> But I would still like to maintain the ability to transparently accept
>> from HTTP or FTP as well as providing the users the option to access
>> authenticated URLs. I don't know how widely used these features are,
>> but I'd
>> hate to remove a feature that could help a user out.
>> Seems we were guilty in assuming what SimplePie did during these 11
>> Although I still think it's going about it the wrong way. 11 seconds is
>> suicide. I sanitize against the extracted data, rather than the feed
>> as a whole. That's what I presume SimplePie is doing. I wish I could
>> it out for myself but my sleep indicators are overloaded.
>> Morbus' suggestion to pass along the string as whole sounds logical, I'll
>> see what I can do about that. Although I really had assumed that
>> happens from XMLs only so the module would need a considerable amount of
>> change to accommodate non-XML strings. I'll study the option and see
>> what I
>> can do. Anyone care to give me a patch for my next birthday? :-P
>> On 6/20/07, Morbus Iff <morbus at disobey.com > wrote:
>> > > opinion the second is not sanitization and no aggregator needs to
>> > > the code and time on trying to handle non-XML or non-standards
>> > It depends entirely on your definition of "aggregator". In your module,
>> > you have only one parser, really - PHP's SimpleXML (or whatever it's
>> > called) that then sends the loaded data structure to the smaller "do
>> > things with it" (ie., RSS20.inc, etc.) subparsers. However, I'd think
>> > that it'd be far more flexible to send the raw strings around /as well/
>> > - then one could support, for example, non-XML documents (or, in my
>> > particular case, I could write scrapers for sites that don't support
>> > feeds [or feeds that contain useful data]) so that I'd be able to hook
>> > into the generic aggregating process. Aggregation != just XML, IMO.
>> > I'd love, for example, to be able to add a "feed" that points to (pff,
>> > making crap outta my ass here) some comic site's "latest comic" HTML,
>> > choose a custom-made parser that expects that HTML, and return the same
>> > data structure that the aggregation API expects as legit. This /is/
>> > aggregation - pulling disparate sources together.
>> > > I would be very surprised if I found that SimplePie is wasting 11
>> > > seconds out of 12 in preventing XSS or SQL injection attacks
>> alone. But
>> > > hey, what do I know about SimplePie. Does anyone know what SimplePie
>> > > actually does within these 11 seconds?
>> > SimplePie's set_stupidly_fast is a wrapper around:
>> > $this->enable_order_by_date(false);
>> > $this->remove_div(false);
>> > $this->strip_comments(false);
>> > $this->strip_htmltags(false);
>> > $this->strip_attributes(false);
>> > $this->set_image_handler(false);
>> > None of those are "fix broken XML". I reran the initial test like so:
>> > $feed->set_stupidly_fast(TRUE);
>> > $feed->enable_order_by_date(TRUE);
>> > i.e. first shutting everything off, then enabling one command:
>> > $feed->enable_order_by_date(TRUE) 2 seconds
>> > $feed->remove_div(TRUE) 1 second
>> > $feed->strip_comments(TRUE); 2 seconds
>> > $feed->strip_htmltags(TRUE); 2 seconds
>> > $feed->strip_attributes(TRUE); 2 seconds
>> > $feed->set_image_handler(TRUE); 1 second
>> > --
>> > Morbus Iff ( if god is my witness, god must be blind )
>> > Technical: http://www.oreillynet.com/pub/au/779
>> > Culture: http://www.disobey.com/ and http://www.gamegrene.com/
>> > aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus
NGP Software, Inc.
seanr at ngpsoftware.com
More information about the development