stripping tags during aggregation?
Hello all, I've recently been going through the question of weather it's useful to preserve the html tags that are part of aggregated content. So rather than dive into this on my own I wanted to see the consensus on this issue from others who probably have more experience in this than I do. If the <content> or <summary> tags in an ATOM feed have an <img> tag I was always pleased to find the image showing up inline in my aggregated content, but what about formatting tags? Allowing arbitrary code could be a herald for XSS attacks as was noted in the a previous thread. Question: Should all tags in aggregated content be stripped? If not, then what tags should be allowed? If I use filter_xss then what tags should I allow? Is there some specification or article on what HTML tags should be allowed to go through? How do aggregation module authors handle this or advise that it be handled? I really appreciate all feedback on this issue. Thanks :)
Question: Should all tags in aggregated content be stripped? If not, then what tags should be allowed? If I use filter_xss then what tags should I allow? Is there some specification or article on what HTML tags should be allowed to go through? How do aggregation module authors handle this or advise that it be handled?
One of the feature requests for aggregator.module has, for a while, been to allow the user to decide what tags are allowed in the feed (this may have actually happened actually - I haven't checked in a while). Ultimately, what would be most ideal would be to hook aggregator up to the Drupal input formats - then a user could define as many aggregator related import formats as they'd like, and assign them however they'd like to whatever feeds they have ("this one allows images", "this one doesn't" ... "ooh, this one always does things in a blockquote", etc.) -- Morbus Iff ( omnia mutantur, nihil interit ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ aim: akaMorbus / skype: morbusiff / icq: 2927491 / jabber.org: morbus
On Jun 26, 2007, at 5:34 AM, Morbus Iff wrote:
Question: Should all tags in aggregated content be stripped? If not, then what tags should be allowed? If I use filter_xss then what tags should I allow? Is there some specification or article on what HTML tags should be allowed to go through? How do aggregation module authors handle this or advise that it be handled?
One of the feature requests for aggregator.module has, for a while, been to allow the user to decide what tags are allowed in the feed (this may have actually happened actually - I haven't checked in a while). Ultimately, what would be most ideal would be to hook aggregator up to the Drupal input formats - then a user could define as many aggregator related import formats as they'd like, and assign them however they'd like to whatever feeds they have ("this one allows images", "this one doesn't" ... "ooh, this one always does things in a blockquote", etc.)
This feature already exists. http://yourdomain.com/admin/content/ aggregator/settings has a field, "Allowed HTML tags:" where you can designate what tags shall be passed through from feeds into your site. Laura
On 6/26/07, Laura Scott <laura@pingv.com> wrote:
On Jun 26, 2007, at 5:34 AM, Morbus Iff wrote:
Ultimately, what would be most ideal would be to hook aggregator up to the Drupal input formats - then a user could define as many aggregator related import formats as they'd like, and assign them however they'd like to whatever feeds they have ("this one allows images", "this one doesn't" ... "ooh, this one always does things in a blockquote", etc.)
This feature already exists. http://yourdomain.com/admin/content/ aggregator/settings has a field, "Allowed HTML tags:" where you can designate what tags shall be passed through from feeds into your site.
Except it's global, instead of per-feed. And it doesn't use input formats, it uses a single strip tags entry. So, short answer is, make it configurable. If I want to add a feed from another site (that I own...) that has all tags and javascript and embed codes and everything....then I should be able to. -- Boris Mann Office 604-682-2889 Skype borismann http://www.bryght.com
On Jun 26, 2007, at 9:46 AM, Boris Mann wrote:
On 6/26/07, Laura Scott <laura@pingv.com> wrote: On Jun 26, 2007, at 5:34 AM, Morbus Iff wrote:
Ultimately, what would be most ideal would be to hook aggregator up to the Drupal input formats - then a user could define as many aggregator related import formats as they'd like, and assign them however they'd like to whatever feeds they have ("this one allows images", "this one doesn't" ... "ooh, this one always does things in a blockquote", etc.)
This feature already exists. http://yourdomain.com/admin/content/ aggregator/settings has a field, "Allowed HTML tags:" where you can designate what tags shall be passed through from feeds into your site.
Except it's global, instead of per-feed. And it doesn't use input formats, it uses a single strip tags entry.
So, short answer is, make it configurable. If I want to add a feed from another site (that I own...) that has all tags and javascript and embed codes and everything....then I should be able to.
That indeed would be ideal. Laura http://pingv.com
Op dinsdag 26 juni 2007, schreef Boris Mann:
So, short answer is, make it configurable. If I want to add a feed from another site (that I own...) that has all tags and javascript and embed codes and everything....then I should be able to.
Feedparser brings the nodes in as feeds (if you use the node_feed module on top of this library, that is). Thus allowing input formats. Bèr -- Drupal, Ruby on Rails and Joomla! development: webschuur.com | Drupal hosting: www.sympal.nl
Feedparser brings the nodes in as feeds (if you use the node_feed module on top of this library, that is). Thus allowing input formats.
Which presumes you want them as nodes. I, and a few others, don't. I don't mind that other people do, as I will delight in spanking them for linkrot. I'll support an API that lets me choose (FP and SF both do, and hopefully Aron's SOC too) on a per feed basis. Morbus Iff morbus@disobey.com
Op woensdag 27 juni 2007, schreef Morbus Iff:
I'll support an API that lets me choose
feedparser is no more then an API. with pretty good storage of links (I would like to see it coupled to link bundle to allow scheduled checks for linkrot/301s etc). But I don't see why the database model would prevent you from linkrot. In any case: feedparser is an API; you can bolt your own stuff on top, incl the default Drupal feed storage/presentation. But also including nodes, wich hands you all the goodness of nodes, such as, but certainly not limited to, stripping tags based on input filters. Bèr -- Drupal, Ruby on Rails and Joomla! development: webschuur.com | Drupal hosting: www.sympal.nl
I chose to bring items in as nodes because then you can utilize so many of the node features like editing an aggregated node, using all modules that affect nodes, ability to attach comments, user ownership, categorizing them through the taxonomy system, publishing, moderation, revisions (although I don't implement revisions yet but it sounds intriguing to play on this thought), sticky, etc. This is what popped to mind in an instant, I could also see integration with other node oriented modules such as views and some vague and definitely undeveloped CCK integration, path-auto comes to mind as well. Aggregating feeds in a different way would make them a total foreign entity that needs special hooks, special handling, new modules that utilize a new API, I'm just stating the thoughts I had when I reached the conclusion to aggregate them as nodes. Thanks for the suggestion on how to handle input. Filtering input through the node filter system sounds ideal. Never would have reached this on my own. I'll take the opportunity to ask if there are any developers here interested in co-maintaining the aggregation module with me, you can drop me a line off list. I really need extra skilled people here so as to not screw up my module :P Just go take a look at the pending patches and feature requests and your going to drool at the beauty ;) You'll also know why a co-maintainer sounds like a good idea. Sponsors don't need any invitations, they're always welcome hehe :P On 6/27/07, Bèr Kessels <ber@webschuur.com> wrote:
Op woensdag 27 juni 2007, schreef Morbus Iff:
I'll support an API that lets me choose
feedparser is no more then an API. with pretty good storage of links (I would like to see it coupled to link bundle to allow scheduled checks for linkrot/301s etc).
But I don't see why the database model would prevent you from linkrot.
In any case: feedparser is an API; you can bolt your own stuff on top, incl the default Drupal feed storage/presentation. But also including nodes, wich hands you all the goodness of nodes, such as, but certainly not limited to, stripping tags based on input filters.
Bèr -- Drupal, Ruby on Rails and Joomla! development: webschuur.com | Drupal hosting: www.sympal.nl
Importing feeds as nodes would also offer a very clean and flexible way of pushing data between sister-sites. I've never had much luck with pub/sub, but a feed-based solution would offer a lot of flexibility. (Imagine building your outgoing feed using Views. You could export virtually any combination of nodes for a sister site to consume into mirrored nodes as well.) On Wednesday 27 June 2007, Ashraf Amayreh wrote:
I chose to bring items in as nodes because then you can utilize so many of the node features like editing an aggregated node, using all modules that affect nodes, ability to attach comments, user ownership, categorizing them through the taxonomy system, publishing, moderation, revisions (although I don't implement revisions yet but it sounds intriguing to play on this thought), sticky, etc.
This is what popped to mind in an instant, I could also see integration with other node oriented modules such as views and some vague and definitely undeveloped CCK integration, path-auto comes to mind as well. Aggregating feeds in a different way would make them a total foreign entity that needs special hooks, special handling, new modules that utilize a new API, I'm just stating the thoughts I had when I reached the conclusion to aggregate them as nodes.
Thanks for the suggestion on how to handle input. Filtering input through the node filter system sounds ideal. Never would have reached this on my own.
I'll take the opportunity to ask if there are any developers here interested in co-maintaining the aggregation module with me, you can drop me a line off list. I really need extra skilled people here so as to not screw up my module :P Just go take a look at the pending patches and feature requests and your going to drool at the beauty ;) You'll also know why a co-maintainer sounds like a good idea. Sponsors don't need any invitations, they're always welcome hehe :P
On 6/27/07, Bèr Kessels <ber@webschuur.com> wrote:
Op woensdag 27 juni 2007, schreef Morbus Iff:
I'll support an API that lets me choose
feedparser is no more then an API. with pretty good storage of links (I would like to see it coupled to link bundle to allow scheduled checks for linkrot/301s etc).
But I don't see why the database model would prevent you from linkrot.
In any case: feedparser is an API; you can bolt your own stuff on top, incl the default Drupal feed storage/presentation. But also including nodes, wich hands you all the goodness of nodes, such as, but certainly not limited to, stripping tags based on input filters.
Bèr -- Drupal, Ruby on Rails and Joomla! development: webschuur.com | Drupal hosting: www.sympal.nl
-- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson
participants (6)
-
Ashraf Amayreh -
Boris Mann -
Bèr Kessels -
Larry Garfield -
Laura Scott -
Morbus Iff