Re: [development] Data import and synchronization to nodes

28 Aug 2007

      For the most part, what Ken said --

Particularly wrt the FeedAPI project, which looks incredibly strong.

WRT courses, I'd actually recommend creating nodes on import -- this 
obviously depends on your use case, but in most situations where I've 
had to deal with this type of data, people are interacting with the 
courses almost immediately.

The other challenge will be to determine what constitutes an updated 
course, and what constitutes a new course. Toward this end, as much 
specific information as possible you can capture in the import (short, 
of course, of a specific ID for each course, which makes it all so much 
easier :) ) -- can you get semester info (ie, fall, 2007), instructor 
info, room location, description, etc. So, with this in mind, you'll 
need to determine when a course is new, and whether that merits updating 
the existing node, or creating a new one.

This also gets at your data structure for courses, and how granular it 
is -- how much info is stored along with a course, and how much is 
stored as separate nodes, or within separate tables?

For example, does a course contain semester info? Room info?

Anyways -- I look forward to hearing the solution you choose.

Cheers,

Bill

Ken Rickard wrote:
...
@Earnie: Take a look at the FeedAPI SoC project, it includes pluggable 
XML parsing and node creation hooks.
@Larry-
My preference here is the old 'lazy instantiation' trick. [1]
Import the data and write a callback that will present the table view 
of courses, etc.  You're dealing with structured data, so your 
callbacks shouldl make it easy for people to browse the data (think MVC).
Keep a {data_node} lookup table.
When writing links for individual items, check the {data_node} table.  
If found, write the link to node/NID, otherwise, write it to a 
node-generating callback that also inserts a record into {data_node}.
This way, you only create nodes when your users want to interact with 
them.  Saves lots of processing overhead.
I have some sample code if you need it.
One drawback: if you want the data to be searchable, you either have 
to initiate your own hook_search, or wait for the nodes to be created.
- Ken
[1] http://barcelona2007.drupalcon.org/node/58
On 8/28/07, *Earnie Boyd * <earnie@users.sourceforge.net 
<mailto:earnie@users.sourceforge.net>> wrote:
Quoting Larry Garfield <larry@garfieldtech.com
    <mailto:larry@garfieldtech.com>>:
>
    > So, I toss the brain-teaser out there: Is there a good way to
    have my nodes
    > and import them too, or are these cases where nodes are simply
    the wrong tool
    > and the direct-import-and-cache mechanisms described above are
    the optimal
    > solutions?
    >
Not that I've found and I've spent several hours recently researching
    this.  Chris Mellor and I have begun collaborating on this issue here
    http://portallink.linkmatics.com/gdf and have development staging here
    http://datafeed.progw.org.  Help is welcome, we want to be able to
    feed
    all types of external data.  Goals are being established and
    documented
    on the http://portallink.linkmatics.com/gdf pages.  *Note* we are
    aware
    of all the existing modules and API and our plans are to use the
    existing things as well as create what is missing.
I've found http://drupal.org/project/feedparser which will accept RSS,
    RDF or ATOM feeds and create nodes or aggregated lists.  I am
    successfully using that module with a change documented in issue
    http://drupal.org/node/169865 at http://give-me-an-offer.com.
Earnie
-- 
Bill Fitzgerald
http://www.funnymonkey.com
Tools for Teachers
503.897.7160