Hi David, Sorry for the delayed reply... I chose to extend my trip another day. A much-needed break that's paying off in general attitude adjustments and coffee consumption (way down.) So, I was likely not clear about my skill at programming PHP and/or Drupal modules: "not fantastic." I didn't mean to say that the feed is necessarily "so complicated", but more specifically, it's botching every parser I've tried -- and each tweak I've tried to the given parsers. It's a predictable XML structure, but there are several different types of elements that defy simple parsing, IMO. First, the main issues I always run into (with the FeedAPI parsers, mainly) are numbered arrays. Any time there's a numeric array name, most parsers seem to ditch the whole tree below that element. The only exceptions are arrays where the parser knows there's likely to be an array in at the element, like "tags" or such. I did get around this with an admittedly-crude hack, where I told the parser to ignore sub-arrays after a certain point for a specific element that was causing me trouble (in this case, an enclosure element) while using the FeedAPI & Simplepie parser combo -- detailed here if anyone wants to read further... I apologize again for the hacky nature of my solution: http://www.thisworked.com/content/drupal-feedapi-feed-element-mapper-missing... The thing is, since that's of course a very bad hack and only specifically "solves" one exact problem with a known feed, there are several other places in the feed where nested arrays with numeric names are also ignored. I gave up on this ugly approach at this point, not wanting to further butcher the parser without just writing my own. So... the main reason I failed so miserably with the XML version of this particular feed is the fact they (feed authors) have entrenched a great deal of important data into the <summary> and <description> elements, but nested inside a bunch of <dl><dt><dd> tags which seem like they should be parsable with proper care. In the JSON version, these are all very nicely extracted out into the root of the feed as separate elements. I realize now, after much festering with RSS parsers that the use of <summary>/<description> is done solely to get around the limitations of the RSS specs. Any RSS parser will strip off all sorts of special elements, so there's not much point in including them, I suppose. In my case, I could have dug the info out, but I understand their logic there -- typical RSS readers/parsers would butcher the data. Here's a quick sample of what the <summary> tag looks like and why I again failed to figure out how to parse through it and get the data out of all the nested tags into mappable elements for Feed Element Mapper: <summary type="xhtml"> <div xmlns="http://www.w3.org/1999/xhtml"> <dl> <dt>Recommended</dt> <dd class="recommended">FALSE</dd> <dt>Width</dt> <dd class="width">640</dd> <dt>Height</dt> <dd class="height">480</dd> <dt>Categories</dt> <dd class="categories">First, Second, Other</dd> (...) </dl> </div> </summary> The point of this weak example is that there are a bunch of very useful bits buried in the <summary> that I haven't been able to extract. I understand that an array recursion expert might swim right through there, but I didn't figure it out before giving up on that path. I do really think that a parser with FeedAPI /should/ be able to dig through that element and pull out all sorts of niceties, but I don't understand how. ...so, this email is getting way too long. Sorry. :P Hopefully this gives enough of a picture. I will go answer other email now. (Seriously sorry about the lack of brevity.) Paul David Metzler wrote:
Hey Paul,
I haven't done this with JSON, but have written some XML to nested array conversion stuff that might help, but might not. Could you shoot an example of the feed and what makes it so complicated so that we don't shower you with irrelavent solutions :). Is it that you're trying to parse data that's inside XML that makes it so nasty or something else?
You give me the impression that the JSON feed contains more information than the other feed. Is that true?
Dave On Feb 15, 2009, at 6:33 AM, Paul Hoza wrote:
Hello folks,
I've been struggling for a long time (way too long) to get a data feed imported into CCK nodes. I've attempted a plethora of different strategies, including but not limited to: FeedAPI, Feed Element Mapper, custom parsers, hacking parsers, tweaking mappers, Yahoo! Pipes to create custom feed versions of the original feed, serialized PHP exports, etc.
I'm tired of the project, but I have to find a way to get this to work.
So, I found a few articles on programmatically creating CCK nodes and I'm hoping to connect with anyone who's had experience doing this with JSON data. There is an XML/RSS version of this feed I need, but it sucks compared to the JSON version, with respect to how much data is in there and how it's formatted. The XML version hides a lot of crucial info into a <summary> element... which I might be able to parse through separately, but RSS feed aggregators just ignore stuff in there. Again, I'd have to make a custom parser to get in there.
Here's an article that hits about as close as I've seen yet. I am leaving for a couple days, so I'll try to get something like this working when I get back, but I hoped to hear from anyone who's done the same thing. Information on using JSON data to create nodes is sparse, but this article hits pretty close to the mark: https://secure.prolucid.com/node/43 I had read other posts about doing similar methods using drupal_execute(), et. al, but they all talk only about XML as data source. I haven't found anything talking about JSON or (un)serialized PHP sources.
What I really need to do is do an initial import of the JSON feed into my CCK node (which is a huge feed of 6,200+ items). After that, I want to check the feed every day for changes and create new daily nodes accordingly -- which is why FeedAPI really seemed like the ticket, aside from my massive struggles with making my own parser. For now, I'd be happy with a PHP script that I could call daily with cron.
Thanks for any feedback... sorry for the long post. Part rant, part plea. :)
Cheers, Paul Hoza