[development] Data import and synchronization to nodes

Tue Aug 28 14:55:47 UTC 2007

Hi Ken.  Yes, I'm using your lazy-import method now for another project and it's working well.  The problem here is that not just the data but the structure of the data may change, and I need to be able to change it on-demand, including en-masse.

Lazy-import could work for the Docbook case, but what happens then when a section is moved to another chapter?  How does the code know that the new section is actually the old node, so it now has to be moved in the page tree and regenerated, and every page around it has to be regenerated (for next/prev links to be rebuilt)?  The only unique identifier for it would be its XPath address, but that just changed.  I don't know if I can require an ID on every element, as that would run into huge collision problems in some cases.  (eg, /installation/gettingstarted vs /writingmodule/gettingstarted.  That causes me grief in the current DocBook system I hope to replace this way.)

For courses, I have no guarantee that a course will even exist in the new import.  While it does have a Course ID, and sections have a Section ID, Term, etc. (and joins without a surrogate key get ugly when there's 4 values in the primary key, which is the system I am replacing with Drupal), I would need to detect and immediately delete courses not in the new CSV, so I'd be parsing the whole thing anyway.  

That's why I don't think the usual lazy-import method would work here.

--Larry Garfield

On Tue, 28 Aug 2007 09:08:10 -0400, "Ken Rickard" <agentrickard at gmail.com> wrote:
> @Earnie: Take a look at the FeedAPI SoC project, it includes pluggable XML
> parsing and node creation hooks.
> 
> @Larry-
> 
> My preference here is the old 'lazy instantiation' trick. [1]
> 
> Import the data and write a callback that will present the table view of
> courses, etc.  You're dealing with structured data, so your callbacks
> shouldl make it easy for people to browse the data (think MVC).
> 
> Keep a {data_node} lookup table.
> 
> When writing links for individual items, check the {data_node} table.  If
> found, write the link to node/NID, otherwise, write it to a
> node-generating
> callback that also inserts a record into {data_node}.
> 
> This way, you only create nodes when your users want to interact with
> them.
> Saves lots of processing overhead.
> 
> I have some sample code if you need it.
> 
> One drawback: if you want the data to be searchable, you either have to
> initiate your own hook_search, or wait for the nodes to be created.
> 
> 
> - Ken
> 
> [1] http://barcelona2007.drupalcon.org/node/58
> 
> 
> On 8/28/07, Earnie Boyd <earnie at users.sourceforge.net> wrote:
>>
>>
>>
>> Quoting Larry Garfield <larry at garfieldtech.com>:
>>
>> >
>> > So, I toss the brain-teaser out there: Is there a good way to have my
>> nodes
>> > and import them too, or are these cases where nodes are simply the
> wrong
>> tool
>> > and the direct-import-and-cache mechanisms described above are the
>> optimal
>> > solutions?
>> >
>>
>> Not that I've found and I've spent several hours recently researching
>> this.  Chris Mellor and I have begun collaborating on this issue here
>> http://portallink.linkmatics.com/gdf and have development staging here
>> http://datafeed.progw.org.  Help is welcome, we want to be able to feed
>> all types of external data.  Goals are being established and documented
>> on the http://portallink.linkmatics.com/gdf pages.  *Note* we are aware
>> of all the existing modules and API and our plans are to use the
>> existing things as well as create what is missing.
>>
>> I've found http://drupal.org/project/feedparser which will accept RSS,
>> RDF or ATOM feeds and create nodes or aggregated lists.  I am
>> successfully using that module with a change documented in issue
>> http://drupal.org/node/169865 at http://give-me-an-offer.com.
>>
>> Earnie
>>
> 
>