[development] cURL and drupal_http_request do not properly download certain Google News feeds
Alex Barth
alex at developmentseed.org
Tue Jan 19 19:15:52 UTC 2010
After getting a report that
http://news.google.com/news?pz=1&hl=ar&q=سوريا&cf=all&output=rss
is not properly downloading with Feeds module, I dug deep and
discovered that cURL and drupal_http_request() return an RSS feed with
no items, while wget and PHP stream_get_contents() do return a full
RSS feed with a number of items.
Details here: http://drupal.org/node/689552
I am unsure what is actually causing this peculiar behavior and I
would appreciate people's input. The issue affects not just Feeds but
any other Drupal module that downloads and processes Google News RSS
feeds - including core aggregator.
- This seems to be an issue where Google News decides, based on some
request parameters, what content to return and what not - or am I
missing something?
- The user agent is the same in cases where the issue occurs and where
it doesn't, I am using the same machine for all tests - what else
could Google use to distinguish my requests?
- Any tips on an 'HTTP monitor' I could be using to actually monitor
outgoing HTTP requests from my local machine?
Alex Barth
http://www.developmentseed.org/blog
tel (202) 250-3633
More information about the development
mailing list