[development] Data import and synchronization to nodes

Tue Aug 28 13:33:59 UTC 2007

For the most part, what Ken said --

Particularly wrt the FeedAPI project, which looks incredibly strong.

WRT courses, I'd actually recommend creating nodes on import -- this 
obviously depends on your use case, but in most situations where I've 
had to deal with this type of data, people are interacting with the 
courses almost immediately.

The other challenge will be to determine what constitutes an updated 
course, and what constitutes a new course. Toward this end, as much 
specific information as possible you can capture in the import (short, 
of course, of a specific ID for each course, which makes it all so much 
easier :) ) -- can you get semester info (ie, fall, 2007), instructor 
info, room location, description, etc. So, with this in mind, you'll 
need to determine when a course is new, and whether that merits updating 
the existing node, or creating a new one.

This also gets at your data structure for courses, and how granular it 
is -- how much info is stored along with a course, and how much is 
stored as separate nodes, or within separate tables?

For example, does a course contain semester info? Room info?

Anyways -- I look forward to hearing the solution you choose.

Cheers,

Bill

Ken Rickard wrote:
> @Earnie: Take a look at the FeedAPI SoC project, it includes pluggable 
> XML parsing and node creation hooks.
>
> @Larry-
>
> My preference here is the old 'lazy instantiation' trick. [1]
>
> Import the data and write a callback that will present the table view 
> of courses, etc.  You're dealing with structured data, so your 
> callbacks shouldl make it easy for people to browse the data (think MVC).
>
> Keep a {data_node} lookup table.
>
> When writing links for individual items, check the {data_node} table.  
> If found, write the link to node/NID, otherwise, write it to a 
> node-generating callback that also inserts a record into {data_node}.
>
> This way, you only create nodes when your users want to interact with 
> them.  Saves lots of processing overhead.
>
> I have some sample code if you need it.
>
> One drawback: if you want the data to be searchable, you either have 
> to initiate your own hook_search, or wait for the nodes to be created.
>
>
> - Ken
>
> [1] http://barcelona2007.drupalcon.org/node/58
>
>
> On 8/28/07, *Earnie Boyd * <earnie at users.sourceforge.net 
> <mailto:earnie at users.sourceforge.net>> wrote:
>
>
>
>     Quoting Larry Garfield <larry at garfieldtech.com
>     <mailto:larry at garfieldtech.com>>:
>
>     >
>     > So, I toss the brain-teaser out there: Is there a good way to
>     have my nodes
>     > and import them too, or are these cases where nodes are simply
>     the wrong tool
>     > and the direct-import-and-cache mechanisms described above are
>     the optimal
>     > solutions?
>     >
>
>     Not that I've found and I've spent several hours recently researching
>     this.  Chris Mellor and I have begun collaborating on this issue here
>     http://portallink.linkmatics.com/gdf and have development staging here
>     http://datafeed.progw.org.  Help is welcome, we want to be able to
>     feed
>     all types of external data.  Goals are being established and
>     documented
>     on the http://portallink.linkmatics.com/gdf pages.  *Note* we are
>     aware
>     of all the existing modules and API and our plans are to use the
>     existing things as well as create what is missing.
>
>     I've found http://drupal.org/project/feedparser which will accept RSS,
>     RDF or ATOM feeds and create nodes or aggregated lists.  I am
>     successfully using that module with a change documented in issue
>     http://drupal.org/node/169865 at http://give-me-an-offer.com.
>
>     Earnie
>
>

-- 
Bill Fitzgerald
http://www.funnymonkey.com
Tools for Teachers
503.897.7160