I second the recommendation of using QueryPath. I use it almost exclusively along with drupal_http_request, though I use curl only in a few places (if you use curl I recommend <a href="http://drupal.org/project/curl">http://drupal.org/project/curl</a> for a dependency check). I'd really recommend though creating a custom module that uses the above and then has your logic for filtering in it, I've done this for about a dozen modules now. <div>
<br></div><div>That said, there are some more modules available out there nowadays, such as using <meta http-equiv="content-type" content="text/html; charset=utf-8"><a href="http://drupal.org/project/feeds_xpathparser">http://drupal.org/project/feeds_xpathparser</a> with feeds <a href="http://drupal.org/project/feeds">http://drupal.org/project/feeds</a> There are about a dozen more modules that will accomplish the goal though I haven't used them, but I went through and tried most of the methods out for some recent projects. </div>
<div><br></div><div>Cheers,</div><div><br clear="all">Kevin O'Brien<div>Drupal Developer</div><div><a href="http://www.coderintherye.com" target="_blank">http://www.coderintherye.com</a></div><div>415-754-0112</div><br>
<br><br><div class="gmail_quote">On Tue, Nov 30, 2010 at 11:26 AM, <span dir="ltr"><<a href="mailto:development-request@drupal.org">development-request@drupal.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Send development mailing list submissions to<br>
<a href="mailto:development@drupal.org">development@drupal.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="http://lists.drupal.org/mailman/listinfo/development" target="_blank">http://lists.drupal.org/mailman/listinfo/development</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:development-request@drupal.org">development-request@drupal.org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:development-owner@drupal.org">development-owner@drupal.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of development digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. Drupal module for scraping information from an HTML/XML<br>
document (James Benstead)<br>
2. Re: Drupal module for scraping information from an HTML/XML<br>
document (John Fiala)<br>
3. Easter problem (?mon Tam?s)<br>
4. Re: Easter problem (Carl Wiedemann)<br>
5. Re: Easter problem (<a href="mailto:larry@garfieldtech.com">larry@garfieldtech.com</a>)<br>
6. Re: Easter problem (<a href="mailto:jeff@ayendesigns.com">jeff@ayendesigns.com</a>)<br>
7. Re: Easter problem (<a href="mailto:larry@garfieldtech.com">larry@garfieldtech.com</a>)<br>
8. Re: Easter problem (Jennifer Hodgdon)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Tue, 30 Nov 2010 18:56:09 +0000<br>
From: James Benstead <<a href="mailto:james.benstead@gmail.com">james.benstead@gmail.com</a>><br>
Subject: [development] Drupal module for scraping information from an<br>
HTML/XML document<br>
To: development <<a href="mailto:development@drupal.org">development@drupal.org</a>><br>
Message-ID:<br>
<AANLkTi=<a href="mailto:AFhBkvyURzgwNB54Z%2Bq-rRj_B_uRLZbUUd3UV@mail.gmail.com">AFhBkvyURzgwNB54Z+q-rRj_B_uRLZbUUd3UV@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="iso-8859-1"<br>
<br>
I've finally got round to doing some serious work on Drupalversity, an open,<br>
web-based Drupal education project I've had in mind for a year or so.<br>
<br>
People who use Drupalversity to learn have the option of adding Resources to<br>
the site - i.e., links to posts at Lullabot, Chapter3 etc that explain how<br>
to do specific things with Drupal. A Resource is a custom content type that<br>
includes a link to the resource and a text field containing a description of<br>
that resource.<br>
<br>
What I'd like to do once a Resource has been added to the site is to scrape<br>
certain information from it: at this point I'm thinking the Title of the<br>
page the link points to and the provider of the resource - e.g., which<br>
Drupal shop originally created the resource. What's the best way to go about<br>
doing this? I'm pretty sure there's not a Drupal module that solves the<br>
problem out of the box.<br>
<br>
So far I've considered:<br>
<br>
- <a href="http://drupal.org/project/querypath" target="_blank">http://drupal.org/project/querypath</a><br>
- Drupal's built-in drupal_http_request() -<br>
<a href="http://api.drupal.org/api/drupal/includes--common.inc/function/drupal_http_request/6" target="_blank">http://api.drupal.org/api/drupal/includes--common.inc/function/drupal_http_request/6</a><br>
- curl<br>
<br>
Thanks,<br>
<br>
--Jim<br>
--<br>
My IM and Skype details are at <a href="http://state68.com/contact" target="_blank">http://state68.com/contact</a><br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <a href="http://lists.drupal.org/pipermail/development/attachments/20101130/5600f1fe/attachment-0001.html" target="_blank">http://lists.drupal.org/pipermail/development/attachments/20101130/5600f1fe/attachment-0001.html</a><br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Tue, 30 Nov 2010 12:06:33 -0700<br>
From: John Fiala <<a href="mailto:jcfiala@gmail.com">jcfiala@gmail.com</a>><br>
Subject: Re: [development] Drupal module for scraping information from<br>
an HTML/XML document<br>
To: <a href="mailto:development@drupal.org">development@drupal.org</a><br>
Message-ID:<br>
<AANLkTi=<a href="mailto:N6WxHfigUC4ZopfxswMBv8bj7BZZJErHmko_T@mail.gmail.com">N6WxHfigUC4ZopfxswMBv8bj7BZZJErHmko_T@mail.gmail.com</a>><br>
Content-Type: text/plain; charset=ISO-8859-1<br>
<br>
These days, if I'm going to be trying to extract data from html/xml,<br>
I'd use querypath. Give it a try!<br>
<br>
On Tue, Nov 30, 2010 at 11:56 AM, James Benstead<br>
<<a href="mailto:james.benstead@gmail.com">james.benstead@gmail.com</a>> wrote:<br>
> What I'd like to do once a Resource has been added to the site is to scrape<br>
> certain information from it: at this point I'm thinking the Title of the<br>
> page the link points to and the provider of the resource - e.g., which<br>
> Drupal shop originally created the resource. What's the best way to go about<br>
> doing this? I'm pretty sure there's not a Drupal module that solves the<br>
> problem out of the box.<br>
<br>
--<br>
John Fiala<br>
<a href="http://www.jcfiala.net" target="_blank">www.jcfiala.net</a><br>
<br>
<br>
------------------------------<br>
<br>
Message: 3<br>
Date: Tue, 30 Nov 2010 20:14:04 +0100<br>
From: ?mon Tam?s <<a href="mailto:amont@5net.hu">amont@5net.hu</a>><br>
Subject: [development] Easter problem<br>
To: <a href="mailto:development@drupal.org">development@drupal.org</a><br>
Message-ID:<br>
<<a href="mailto:AANLkTikmKoVkedks2FkWUbHRq9sNTe6r0iX%2BiMjmBtvy@mail.gmail.com">AANLkTikmKoVkedks2FkWUbHRq9sNTe6r0iX+iMjmBtvy@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
Hello,<br>
<br>
I have the nameday module (<a href="http://drupal.org/project/nameday" target="_blank">http://drupal.org/project/nameday</a>) and I get a<br>
feature request for the Greek namedays. How I see it is based on the Easter,<br>
what is not an easy thing to count.<br>
<br>
Well, I want to find some algorithm for Easter, and similar days, what is<br>
can be stored somehow. Maybe it should be a hook or some other think what<br>
can be stored in database.<br>
<br>
<br>
Thanks<br>
<br>
--<br>
?mon Tam?s<br>
Sitefejleszt? ?s programoz?<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <a href="http://lists.drupal.org/pipermail/development/attachments/20101130/c81e61bf/attachment-0001.html" target="_blank">http://lists.drupal.org/pipermail/development/attachments/20101130/c81e61bf/attachment-0001.html</a><br>
<br>
------------------------------<br>
<br>
Message: 4<br>
Date: Tue, 30 Nov 2010 12:22:42 -0700<br>
From: Carl Wiedemann <<a href="mailto:carl.wiedemann@gmail.com">carl.wiedemann@gmail.com</a>><br>
Subject: Re: [development] Easter problem<br>
To: <a href="mailto:development@drupal.org">development@drupal.org</a><br>
Message-ID:<br>
<AANLkTinD9Xz=3inJj2GraAuqde_=<a href="mailto:3yshJDwxCJzu12zr@mail.gmail.com">3yshJDwxCJzu12zr@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="iso-8859-2"<br>
<br>
Does this help? <a href="http://php.net/manual/en/function.easter-days.php" target="_blank">http://php.net/manual/en/function.easter-days.php</a><br>
<br>
On Tue, Nov 30, 2010 at 12:14 PM, ?mon Tam?s <<a href="mailto:amont@5net.hu">amont@5net.hu</a>> wrote:<br>
<br>
> Hello,<br>
><br>
> I have the nameday module (<a href="http://drupal.org/project/nameday" target="_blank">http://drupal.org/project/nameday</a>) and I get a<br>
> feature request for the Greek namedays. How I see it is based on the Easter,<br>
> what is not an easy thing to count.<br>
><br>
> Well, I want to find some algorithm for Easter, and similar days, what is<br>
> can be stored somehow. Maybe it should be a hook or some other think what<br>
> can be stored in database.<br>
><br>
><br>
> Thanks<br>
><br>
> --<br>
> ?mon Tam?s<br>
> Sitefejleszt? ?s programoz?<br>
><br>
><br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <a href="http://lists.drupal.org/pipermail/development/attachments/20101130/55b0fb8a/attachment-0001.html" target="_blank">http://lists.drupal.org/pipermail/development/attachments/20101130/55b0fb8a/attachment-0001.html</a><br>
<br>
------------------------------<br>
<br>
Message: 5<br>
Date: Tue, 30 Nov 2010 13:24:07 -0600<br>
From: "<a href="mailto:larry@garfieldtech.com">larry@garfieldtech.com</a>" <<a href="mailto:larry@garfieldtech.com">larry@garfieldtech.com</a>><br>
Subject: Re: [development] Easter problem<br>
To: <a href="mailto:development@drupal.org">development@drupal.org</a><br>
Message-ID: <<a href="mailto:4CF54F57.2030602@garfieldtech.com">4CF54F57.2030602@garfieldtech.com</a>><br>
Content-Type: text/plain; charset=UTF-8; format=flowed<br>
<br>
There's no need for a hook here at all. You can either code in the<br>
algorithm for defining when Easter is (which sounds like it is in fact<br>
rather complicated) or just pre-store know pre-calculated dates for it<br>
for the next decade or so. (10 records, one per year; totally easy.)<br>
<br>
Both options are described here, including the different mechanisms for<br>
defining when Easter is in different calendars:<br>
<br>
<a href="http://en.wikipedia.org/wiki/Easter#Date_of_Easter" target="_blank">http://en.wikipedia.org/wiki/Easter#Date_of_Easter</a><br>
<br>
--Larry Garfield<br>
<br>
On 11/30/10 1:14 PM, ?mon Tam?s wrote:<br>
> Hello,<br>
><br>
> I have the nameday module (<a href="http://drupal.org/project/nameday" target="_blank">http://drupal.org/project/nameday</a>) and I get<br>
> a feature request for the Greek namedays. How I see it is based on the<br>
> Easter, what is not an easy thing to count.<br>
><br>
> Well, I want to find some algorithm for Easter, and similar days, what<br>
> is can be stored somehow. Maybe it should be a hook or some other think<br>
> what can be stored in database.<br>
><br>
><br>
> Thanks<br>
><br>
> --<br>
> ?mon Tam?s<br>
> Sitefejleszt? ?s programoz?<br>
><br>
<br>
<br>
------------------------------<br>
<br>
Message: 6<br>
Date: Tue, 30 Nov 2010 14:23:56 -0500<br>
From: <a href="mailto:jeff@ayendesigns.com">jeff@ayendesigns.com</a><br>
Subject: Re: [development] Easter problem<br>
To: <a href="mailto:development@drupal.org">development@drupal.org</a><br>
Message-ID: <<a href="mailto:4CF54F4C.2060409@ayendesigns.com">4CF54F4C.2060409@ayendesigns.com</a>><br>
Content-Type: text/plain; charset="utf-8"<br>
<br>
You can google it, but I believe this is one of those things that cannot<br>
be reduced to an equation or algorithm. It's something like the first<br>
Sunday after the first full moon after the spring equinox.<br>
<br>
On 11/30/2010 02:14 PM, ?mon Tam?s wrote:<br>
> Hello,<br>
><br>
> I have the nameday module ( <a href="http://drupal.org/project/nameday" target="_blank">http://drupal.org/project/nameday</a>) and I<br>
> get a feature request for the Greek namedays. How I see it is based on<br>
> the Easter, what is not an easy thing to count.<br>
><br>
> Well, I want to find some algorithm for Easter, and similar days, what<br>
> is can be stored somehow. Maybe it should be a hook or some other<br>
> think what can be stored in database.<br>
><br>
><br>
> Thanks<br>
><br>
> --<br>
> ?mon Tam?s<br>
> Sitefejleszt? ?s programoz?<br>
><br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <a href="http://lists.drupal.org/pipermail/development/attachments/20101130/38791578/attachment-0001.html" target="_blank">http://lists.drupal.org/pipermail/development/attachments/20101130/38791578/attachment-0001.html</a><br>
<br>
------------------------------<br>
<br>
Message: 7<br>
Date: Tue, 30 Nov 2010 13:26:23 -0600<br>
From: "<a href="mailto:larry@garfieldtech.com">larry@garfieldtech.com</a>" <<a href="mailto:larry@garfieldtech.com">larry@garfieldtech.com</a>><br>
Subject: Re: [development] Easter problem<br>
To: <a href="mailto:development@drupal.org">development@drupal.org</a><br>
Message-ID: <<a href="mailto:4CF54FDF.7070506@garfieldtech.com">4CF54FDF.7070506@garfieldtech.com</a>><br>
Content-Type: text/plain; charset=ISO-8859-2; format=flowed<br>
<br>
The Calendar PHP module is not enabled by default in a stock PHP, so I<br>
don't know that you can rely on it (unfortunately). It does have some<br>
cool stuff in it, though.<br>
<br>
--Larry Garfield<br>
<br>
On 11/30/10 1:22 PM, Carl Wiedemann wrote:<br>
> Does this help? <a href="http://php.net/manual/en/function.easter-days.php" target="_blank">http://php.net/manual/en/function.easter-days.php</a><br>
><br>
> On Tue, Nov 30, 2010 at 12:14 PM, ?mon Tam?s <<a href="mailto:amont@5net.hu">amont@5net.hu</a><br>
> <mailto:<a href="mailto:amont@5net.hu">amont@5net.hu</a>>> wrote:<br>
><br>
> Hello,<br>
><br>
> I have the nameday module (<a href="http://drupal.org/project/nameday" target="_blank">http://drupal.org/project/nameday</a>) and I<br>
> get a feature request for the Greek namedays. How I see it is based<br>
> on the Easter, what is not an easy thing to count.<br>
><br>
> Well, I want to find some algorithm for Easter, and similar days,<br>
> what is can be stored somehow. Maybe it should be a hook or some<br>
> other think what can be stored in database.<br>
><br>
><br>
> Thanks<br>
><br>
> --<br>
> ?mon Tam?s<br>
> Sitefejleszt? ?s programoz?<br>
><br>
><br>
<br>
<br>
------------------------------<br>
<br>
Message: 8<br>
Date: Tue, 30 Nov 2010 11:21:08 -0800<br>
From: Jennifer Hodgdon <<a href="mailto:yahgrp@poplarware.com">yahgrp@poplarware.com</a>><br>
Subject: Re: [development] Easter problem<br>
To: <a href="mailto:development@drupal.org">development@drupal.org</a><br>
Message-ID: <<a href="mailto:4CF54EA4.1050502@poplarware.com">4CF54EA4.1050502@poplarware.com</a>><br>
Content-Type: text/plain; charset=UTF-8; format=flowed<br>
<br>
<a href="http://php.net/manual/en/function.easter-date.php" target="_blank">http://php.net/manual/en/function.easter-date.php</a><br>
<br>
On 11/30/2010 11:14 AM, ?mon Tam?s wrote:<br>
> I have the nameday module (<a href="http://drupal.org/project/nameday" target="_blank">http://drupal.org/project/nameday</a>) and I get a<br>
> feature request for the Greek namedays. How I see it is based on the Easter,<br>
> what is not an easy thing to count.<br>
><br>
> Well, I want to find some algorithm for Easter, and similar days, what is<br>
> can be stored somehow. Maybe it should be a hook or some other think what<br>
> can be stored in database.<br>
<br>
--<br>
Jennifer Hodgdon * Poplar ProductivityWare<br>
<a href="http://www.poplarware.com" target="_blank">www.poplarware.com</a><br>
Drupal web sites and custom Drupal modules<br>
<br>
<br>
<br>
------------------------------<br>
<font color="#888888"><br>
--<br>
[ Drupal development list | <a href="http://lists.drupal.org/" target="_blank">http://lists.drupal.org/</a> ]<br>
<br>
End of development Digest, Vol 95, Issue 58<br>
*******************************************<br>
</font></blockquote></div><br></div>