[development] Data import and synchronization to nodes

Ken Rickard agentrickard at gmail.com
Tue Aug 28 23:14:54 UTC 2007


Questions like these are why I'm no good with edge cases or race conditions
:-)

- Ken
(not a CS guy, just a problem solver)

On 8/28/07, Larry Garfield <larry at garfieldtech.com> wrote:
>
>
> Hi Ken.  Yes, I'm using your lazy-import method now for another project
> and it's working well.  The problem here is that not just the data but the
> structure of the data may change, and I need to be able to change it
> on-demand, including en-masse.
>
> Lazy-import could work for the Docbook case, but what happens then when a
> section is moved to another chapter?  How does the code know that the new
> section is actually the old node, so it now has to be moved in the page tree
> and regenerated, and every page around it has to be regenerated (for
> next/prev links to be rebuilt)?  The only unique identifier for it would be
> its XPath address, but that just changed.  I don't know if I can require an
> ID on every element, as that would run into huge collision problems in some
> cases.  (eg, /installation/gettingstarted vs
> /writingmodule/gettingstarted.  That causes me grief in the current DocBook
> system I hope to replace this way.)
>
> For courses, I have no guarantee that a course will even exist in the new
> import.  While it does have a Course ID, and sections have a Section ID,
> Term, etc. (and joins without a surrogate key get ugly when there's 4 values
> in the primary key, which is the system I am replacing with Drupal), I would
> need to detect and immediately delete courses not in the new CSV, so I'd be
> parsing the whole thing anyway.
>
> That's why I don't think the usual lazy-import method would work here.
>
> --Larry Garfield
>
> On Tue, 28 Aug 2007 09:08:10 -0400, "Ken Rickard" <agentrickard at gmail.com>
> wrote:
> > @Earnie: Take a look at the FeedAPI SoC project, it includes pluggable
> XML
> > parsing and node creation hooks.
> >
> > @Larry-
> >
> > My preference here is the old 'lazy instantiation' trick. [1]
> >
> > Import the data and write a callback that will present the table view of
> > courses, etc.  You're dealing with structured data, so your callbacks
> > shouldl make it easy for people to browse the data (think MVC).
> >
> > Keep a {data_node} lookup table.
> >
> > When writing links for individual items, check the {data_node}
> table.  If
> > found, write the link to node/NID, otherwise, write it to a
> > node-generating
> > callback that also inserts a record into {data_node}.
> >
> > This way, you only create nodes when your users want to interact with
> > them.
> > Saves lots of processing overhead.
> >
> > I have some sample code if you need it.
> >
> > One drawback: if you want the data to be searchable, you either have to
> > initiate your own hook_search, or wait for the nodes to be created.
> >
> >
> > - Ken
> >
> > [1] http://barcelona2007.drupalcon.org/node/58
> >
> >
> > On 8/28/07, Earnie Boyd <earnie at users.sourceforge.net> wrote:
> >>
> >>
> >>
> >> Quoting Larry Garfield <larry at garfieldtech.com>:
> >>
> >> >
> >> > So, I toss the brain-teaser out there: Is there a good way to have my
> >> nodes
> >> > and import them too, or are these cases where nodes are simply the
> > wrong
> >> tool
> >> > and the direct-import-and-cache mechanisms described above are the
> >> optimal
> >> > solutions?
> >> >
> >>
> >> Not that I've found and I've spent several hours recently researching
> >> this.  Chris Mellor and I have begun collaborating on this issue here
> >> http://portallink.linkmatics.com/gdf and have development staging here
> >> http://datafeed.progw.org.  Help is welcome, we want to be able to feed
> >> all types of external data.  Goals are being established and documented
> >> on the http://portallink.linkmatics.com/gdf pages.  *Note* we are aware
> >> of all the existing modules and API and our plans are to use the
> >> existing things as well as create what is missing.
> >>
> >> I've found http://drupal.org/project/feedparser which will accept RSS,
> >> RDF or ATOM feeds and create nodes or aggregated lists.  I am
> >> successfully using that module with a change documented in issue
> >> http://drupal.org/node/169865 at http://give-me-an-offer.com.
> >>
> >> Earnie
> >>
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.drupal.org/pipermail/development/attachments/20070828/20137323/attachment.htm 


More information about the development mailing list