[development] PHP 5 > aggregator.module rewrite to XML API?

Morbus Iff morbus at disobey.com
Wed Jun 20 01:28:25 UTC 2007


> opinion the second is not sanitization and no aggregator needs to waste 
> the code and time on trying to handle non-XML or non-standards compliant 

It depends entirely on your definition of "aggregator". In your module, 
you have only one parser, really - PHP's SimpleXML (or whatever it's 
called) that then sends the loaded data structure to the smaller "do 
things with it" (ie., RSS20.inc, etc.) subparsers. However, I'd think 
that it'd be far more flexible to send the raw strings around /as well/ 
- then one could support, for example, non-XML documents (or, in my 
particular case, I could write scrapers for sites that don't support 
feeds [or feeds that contain useful data]) so that I'd be able to hook 
into the generic aggregating process. Aggregation != just XML, IMO.

I'd love, for example, to be able to add a "feed" that points to (pff, 
making crap outta my ass here) some comic site's "latest comic" HTML, 
choose a custom-made parser that expects that HTML, and return the same 
data structure that the aggregation API expects as legit. This /is/ 
aggregation - pulling disparate sources together.

> I would be very surprised if I found that SimplePie is wasting 11 
> seconds out of 12 in preventing XSS or SQL injection attacks alone. But 
> hey, what do I know about SimplePie. Does anyone know what SimplePie 
> actually does within these 11 seconds?

SimplePie's set_stupidly_fast is a wrapper around:

   $this->enable_order_by_date(false);
   $this->remove_div(false);
   $this->strip_comments(false);
   $this->strip_htmltags(false);
   $this->strip_attributes(false);
   $this->set_image_handler(false);

None of those are "fix broken XML". I reran the initial test like so:

   $feed->set_stupidly_fast(TRUE);
   $feed->enable_order_by_date(TRUE);

i.e. first shutting everything off, then enabling one command:

   $feed->enable_order_by_date(TRUE)       2 seconds
   $feed->remove_div(TRUE)                 1 second
   $feed->strip_comments(TRUE);            2 seconds
   $feed->strip_htmltags(TRUE);            2 seconds
   $feed->strip_attributes(TRUE);          2 seconds
   $feed->set_image_handler(TRUE);         1 second

-- 
Morbus Iff ( if god is my witness, god must be blind )
Technical: http://www.oreillynet.com/pub/au/779
Culture: http://www.disobey.com/ and http://www.gamegrene.com/
aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus


More information about the development mailing list