[development] Drupal module for scraping information from an HTML/XML document
james.benstead at gmail.com
Tue Nov 30 18:56:09 UTC 2010
I've finally got round to doing some serious work on Drupalversity, an open,
web-based Drupal education project I've had in mind for a year or so.
People who use Drupalversity to learn have the option of adding Resources to
the site - i.e., links to posts at Lullabot, Chapter3 etc that explain how
to do specific things with Drupal. A Resource is a custom content type that
includes a link to the resource and a text field containing a description of
What I'd like to do once a Resource has been added to the site is to scrape
certain information from it: at this point I'm thinking the Title of the
page the link points to and the provider of the resource - e.g., which
Drupal shop originally created the resource. What's the best way to go about
doing this? I'm pretty sure there's not a Drupal module that solves the
problem out of the box.
So far I've considered:
- Drupal's built-in drupal_http_request() -
My IM and Skype details are at http://state68.com/contact
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the development