Programmatic data importing from JSON source
Hello folks, I've been struggling for a long time (way too long) to get a data feed imported into CCK nodes. I've attempted a plethora of different strategies, including but not limited to: FeedAPI, Feed Element Mapper, custom parsers, hacking parsers, tweaking mappers, Yahoo! Pipes to create custom feed versions of the original feed, serialized PHP exports, etc. I'm tired of the project, but I have to find a way to get this to work. So, I found a few articles on programmatically creating CCK nodes and I'm hoping to connect with anyone who's had experience doing this with JSON data. There is an XML/RSS version of this feed I need, but it sucks compared to the JSON version, with respect to how much data is in there and how it's formatted. The XML version hides a lot of crucial info into a <summary> element... which I might be able to parse through separately, but RSS feed aggregators just ignore stuff in there. Again, I'd have to make a custom parser to get in there. Here's an article that hits about as close as I've seen yet. I am leaving for a couple days, so I'll try to get something like this working when I get back, but I hoped to hear from anyone who's done the same thing. Information on using JSON data to create nodes is sparse, but this article hits pretty close to the mark: https://secure.prolucid.com/node/43 I had read other posts about doing similar methods using drupal_execute(), et. al, but they all talk only about XML as data source. I haven't found anything talking about JSON or (un)serialized PHP sources. What I really need to do is do an initial import of the JSON feed into my CCK node (which is a huge feed of 6,200+ items). After that, I want to check the feed every day for changes and create new daily nodes accordingly -- which is why FeedAPI really seemed like the ticket, aside from my massive struggles with making my own parser. For now, I'd be happy with a PHP script that I could call daily with cron. Thanks for any feedback... sorry for the long post. Part rant, part plea. :) Cheers, Paul Hoza
Hi Paul, Check out http://drupal.org/project/extra_RSS_fields it's very alpha and it's Drupal 5, but it sounds like it's approximately what you would need. Feel free to contact me directly should have any questions about it. cheers, Kristof 2009/2/15 Paul Hoza <paulhoza@gmail.com>:
Hello folks,
I've been struggling for a long time (way too long) to get a data feed imported into CCK nodes. I've attempted a plethora of different strategies, including but not limited to: FeedAPI, Feed Element Mapper, custom parsers, hacking parsers, tweaking mappers, Yahoo! Pipes to create custom feed versions of the original feed, serialized PHP exports, etc.
I'm tired of the project, but I have to find a way to get this to work.
So, I found a few articles on programmatically creating CCK nodes and I'm hoping to connect with anyone who's had experience doing this with JSON data. There is an XML/RSS version of this feed I need, but it sucks compared to the JSON version, with respect to how much data is in there and how it's formatted. The XML version hides a lot of crucial info into a <summary> element... which I might be able to parse through separately, but RSS feed aggregators just ignore stuff in there. Again, I'd have to make a custom parser to get in there.
Here's an article that hits about as close as I've seen yet. I am leaving for a couple days, so I'll try to get something like this working when I get back, but I hoped to hear from anyone who's done the same thing. Information on using JSON data to create nodes is sparse, but this article hits pretty close to the mark: https://secure.prolucid.com/node/43 I had read other posts about doing similar methods using drupal_execute(), et. al, but they all talk only about XML as data source. I haven't found anything talking about JSON or (un)serialized PHP sources.
What I really need to do is do an initial import of the JSON feed into my CCK node (which is a huge feed of 6,200+ items). After that, I want to check the feed every day for changes and create new daily nodes accordingly -- which is why FeedAPI really seemed like the ticket, aside from my massive struggles with making my own parser. For now, I'd be happy with a PHP script that I could call daily with cron.
Thanks for any feedback... sorry for the long post. Part rant, part plea. :)
Cheers, Paul Hoza
Thank you, Kristof, It looks like an interesting direction. I'll look into it and get back to you if I decide to go that path. I've gotten a couple other responses that I'm investigating also. I'm in fact using D-6 (should have stated that up-front, I now realize!), so I'm somewhat deterred by the notion of needing to port anything right now. My knowledge of module authoring is limited, to say the least. Thank you very much for your reply, Paul Kristof Van Tomme wrote:
Hi Paul,
Check out http://drupal.org/project/extra_RSS_fields it's very alpha and it's Drupal 5, but it sounds like it's approximately what you would need.
Feel free to contact me directly should have any questions about it.
cheers, Kristof
2009/2/15 Paul Hoza <paulhoza@gmail.com>:
Hello folks,
I've been struggling for a long time (way too long) to get a data feed imported into CCK nodes...
Hi Paul, Extra RSS was the first real PHP code I ever wrote, so you won't find crazy code in there ;) I haven't investigated yet if views2 does custom field output in RSS feeds, if not, it shouldn't be too hard to port. Cheers, Kristof ***************************** I blog at http://www.pronovix.com/blog Twitter at http://twitter.com/kvantomme You can find my profiles on LinkedIn at http://www.linkedin.com/in/kvantomme XING at https://www.xing.com/profile/Kristof_VanTomme and Facebook at http://www.facebook.com/people/Kristof-Van-Tomme/618847872 2009/2/18 Paul Hoza <paulhoza@gmail.com>:
Thank you, Kristof, It looks like an interesting direction. I'll look into it and get back to you if I decide to go that path. I've gotten a couple other responses that I'm investigating also.
I'm in fact using D-6 (should have stated that up-front, I now realize!), so I'm somewhat deterred by the notion of needing to port anything right now. My knowledge of module authoring is limited, to say the least.
Thank you very much for your reply, Paul
Kristof Van Tomme wrote:
Hi Paul,
Check out http://drupal.org/project/extra_RSS_fields it's very alpha and it's Drupal 5, but it sounds like it's approximately what you would need.
Feel free to contact me directly should have any questions about it.
cheers, Kristof
2009/2/15 Paul Hoza <paulhoza@gmail.com>:
Hello folks,
I've been struggling for a long time (way too long) to get a data feed imported into CCK nodes...
Hey Paul, I haven't done this with JSON, but have written some XML to nested array conversion stuff that might help, but might not. Could you shoot an example of the feed and what makes it so complicated so that we don't shower you with irrelavent solutions :). Is it that you're trying to parse data that's inside XML that makes it so nasty or something else? You give me the impression that the JSON feed contains more information than the other feed. Is that true? Dave On Feb 15, 2009, at 6:33 AM, Paul Hoza wrote:
Hello folks,
I've been struggling for a long time (way too long) to get a data feed imported into CCK nodes. I've attempted a plethora of different strategies, including but not limited to: FeedAPI, Feed Element Mapper, custom parsers, hacking parsers, tweaking mappers, Yahoo! Pipes to create custom feed versions of the original feed, serialized PHP exports, etc.
I'm tired of the project, but I have to find a way to get this to work.
So, I found a few articles on programmatically creating CCK nodes and I'm hoping to connect with anyone who's had experience doing this with JSON data. There is an XML/RSS version of this feed I need, but it sucks compared to the JSON version, with respect to how much data is in there and how it's formatted. The XML version hides a lot of crucial info into a <summary> element... which I might be able to parse through separately, but RSS feed aggregators just ignore stuff in there. Again, I'd have to make a custom parser to get in there.
Here's an article that hits about as close as I've seen yet. I am leaving for a couple days, so I'll try to get something like this working when I get back, but I hoped to hear from anyone who's done the same thing. Information on using JSON data to create nodes is sparse, but this article hits pretty close to the mark: https://secure.prolucid.com/node/43 I had read other posts about doing similar methods using drupal_execute(), et. al, but they all talk only about XML as data source. I haven't found anything talking about JSON or (un) serialized PHP sources.
What I really need to do is do an initial import of the JSON feed into my CCK node (which is a huge feed of 6,200+ items). After that, I want to check the feed every day for changes and create new daily nodes accordingly -- which is why FeedAPI really seemed like the ticket, aside from my massive struggles with making my own parser. For now, I'd be happy with a PHP script that I could call daily with cron.
Thanks for any feedback... sorry for the long post. Part rant, part plea. :)
Cheers, Paul Hoza
Hi David, Thanks for the reply. I need to run out of town for a quick road trip, but will be back tomorrow afternoon. I will reply more usefully as soon as I get back. I don't mean to add cruft to the list by a useless reply, but I'm just very happy that a couple people already replied. (Thanks to Kristof as well.) I'm not even supposed to be checking email since we're a bit late, but I couldn't resist. :) Thanks... more soon, Paul David Metzler wrote:
Hey Paul,
I haven't done this with JSON, but have written some XML to nested array conversion stuff that might help, but might not. Could you shoot an example of the feed and what makes it so complicated so that we don't shower you with irrelavent solutions :). Is it that you're trying to parse data that's inside XML that makes it so nasty or something else?
You give me the impression that the JSON feed contains more information than the other feed. Is that true?
Dave On Feb 15, 2009, at 6:33 AM, Paul Hoza wrote:
JSON is extremely easy to handle, in my experience. Much easier than XML. PHP 5.2+ has a json_decode() command which turns the whole string into nicely structured arrays, for instance. On Sun, Feb 15, 2009 at 3:07 PM, Paul Hoza <paulhoza@gmail.com> wrote:
Hi David, Thanks for the reply. I need to run out of town for a quick road trip, but will be back tomorrow afternoon. I will reply more usefully as soon as I get back.
I don't mean to add cruft to the list by a useless reply, but I'm just very happy that a couple people already replied. (Thanks to Kristof as well.) I'm not even supposed to be checking email since we're a bit late, but I couldn't resist. :)
Thanks... more soon, Paul
David Metzler wrote:
Hey Paul,
I haven't done this with JSON, but have written some XML to nested array conversion stuff that might help, but might not. Could you shoot an example of the feed and what makes it so complicated so that we don't shower you with irrelavent solutions :). Is it that you're trying to parse data that's inside XML that makes it so nasty or something else?
You give me the impression that the JSON feed contains more information than the other feed. Is that true?
Dave On Feb 15, 2009, at 6:33 AM, Paul Hoza wrote:
Hi David, Sorry for the delayed reply... I chose to extend my trip another day. A much-needed break that's paying off in general attitude adjustments and coffee consumption (way down.) So, I was likely not clear about my skill at programming PHP and/or Drupal modules: "not fantastic." I didn't mean to say that the feed is necessarily "so complicated", but more specifically, it's botching every parser I've tried -- and each tweak I've tried to the given parsers. It's a predictable XML structure, but there are several different types of elements that defy simple parsing, IMO. First, the main issues I always run into (with the FeedAPI parsers, mainly) are numbered arrays. Any time there's a numeric array name, most parsers seem to ditch the whole tree below that element. The only exceptions are arrays where the parser knows there's likely to be an array in at the element, like "tags" or such. I did get around this with an admittedly-crude hack, where I told the parser to ignore sub-arrays after a certain point for a specific element that was causing me trouble (in this case, an enclosure element) while using the FeedAPI & Simplepie parser combo -- detailed here if anyone wants to read further... I apologize again for the hacky nature of my solution: http://www.thisworked.com/content/drupal-feedapi-feed-element-mapper-missing... The thing is, since that's of course a very bad hack and only specifically "solves" one exact problem with a known feed, there are several other places in the feed where nested arrays with numeric names are also ignored. I gave up on this ugly approach at this point, not wanting to further butcher the parser without just writing my own. So... the main reason I failed so miserably with the XML version of this particular feed is the fact they (feed authors) have entrenched a great deal of important data into the <summary> and <description> elements, but nested inside a bunch of <dl><dt><dd> tags which seem like they should be parsable with proper care. In the JSON version, these are all very nicely extracted out into the root of the feed as separate elements. I realize now, after much festering with RSS parsers that the use of <summary>/<description> is done solely to get around the limitations of the RSS specs. Any RSS parser will strip off all sorts of special elements, so there's not much point in including them, I suppose. In my case, I could have dug the info out, but I understand their logic there -- typical RSS readers/parsers would butcher the data. Here's a quick sample of what the <summary> tag looks like and why I again failed to figure out how to parse through it and get the data out of all the nested tags into mappable elements for Feed Element Mapper: <summary type="xhtml"> <div xmlns="http://www.w3.org/1999/xhtml"> <dl> <dt>Recommended</dt> <dd class="recommended">FALSE</dd> <dt>Width</dt> <dd class="width">640</dd> <dt>Height</dt> <dd class="height">480</dd> <dt>Categories</dt> <dd class="categories">First, Second, Other</dd> (...) </dl> </div> </summary> The point of this weak example is that there are a bunch of very useful bits buried in the <summary> that I haven't been able to extract. I understand that an array recursion expert might swim right through there, but I didn't figure it out before giving up on that path. I do really think that a parser with FeedAPI /should/ be able to dig through that element and pull out all sorts of niceties, but I don't understand how. ...so, this email is getting way too long. Sorry. :P Hopefully this gives enough of a picture. I will go answer other email now. (Seriously sorry about the lack of brevity.) Paul David Metzler wrote:
Hey Paul,
I haven't done this with JSON, but have written some XML to nested array conversion stuff that might help, but might not. Could you shoot an example of the feed and what makes it so complicated so that we don't shower you with irrelavent solutions :). Is it that you're trying to parse data that's inside XML that makes it so nasty or something else?
You give me the impression that the JSON feed contains more information than the other feed. Is that true?
Dave On Feb 15, 2009, at 6:33 AM, Paul Hoza wrote:
Hello folks,
I've been struggling for a long time (way too long) to get a data feed imported into CCK nodes. I've attempted a plethora of different strategies, including but not limited to: FeedAPI, Feed Element Mapper, custom parsers, hacking parsers, tweaking mappers, Yahoo! Pipes to create custom feed versions of the original feed, serialized PHP exports, etc.
I'm tired of the project, but I have to find a way to get this to work.
So, I found a few articles on programmatically creating CCK nodes and I'm hoping to connect with anyone who's had experience doing this with JSON data. There is an XML/RSS version of this feed I need, but it sucks compared to the JSON version, with respect to how much data is in there and how it's formatted. The XML version hides a lot of crucial info into a <summary> element... which I might be able to parse through separately, but RSS feed aggregators just ignore stuff in there. Again, I'd have to make a custom parser to get in there.
Here's an article that hits about as close as I've seen yet. I am leaving for a couple days, so I'll try to get something like this working when I get back, but I hoped to hear from anyone who's done the same thing. Information on using JSON data to create nodes is sparse, but this article hits pretty close to the mark: https://secure.prolucid.com/node/43 I had read other posts about doing similar methods using drupal_execute(), et. al, but they all talk only about XML as data source. I haven't found anything talking about JSON or (un)serialized PHP sources.
What I really need to do is do an initial import of the JSON feed into my CCK node (which is a huge feed of 6,200+ items). After that, I want to check the feed every day for changes and create new daily nodes accordingly -- which is why FeedAPI really seemed like the ticket, aside from my massive struggles with making my own parser. For now, I'd be happy with a PHP script that I could call daily with cron.
Thanks for any feedback... sorry for the long post. Part rant, part plea. :)
Cheers, Paul Hoza
Hey Paul, Thanks for the sample. I'm assuming that you've gotten what you needed from the dev list here. As it sounds like you got your problem solved. I'm not an expert of Feed API, but I do know XML Parsing, and it would seem that if you were ever to tackle a new feed_api parser, then I've got some code that you might be interested. It's the XML to array conversion code that I alluded to in another thread. I'm hoping to have it published as a contrib in some kind of dev form by the end of the month. I'm busy extracting it from some other code that I've got that isn't really appropriate for API distribution. Let me know if you need any more help. Dave On Feb 18, 2009, at 2:42 AM, Paul Hoza wrote:
uick sample of what the <summary> tag looks
Hi Dave, Thanks for the tip. I have what amounts to a few half-ish done custom parsers -- each in a varied state of mess. I definitely will keep you offer on-hand in case I go that route ultimately. I did get some good responses (a couple that accidentally ended up off-list), so I'm going to tackle this again tonight. The coffee pot's full of fresh Go Juice, so I'm looking forward to getting this tackled tonight... although I've said that to myself an embarrassing number of times on this project. >_> (For the curious...) I'm currently working on a fresh approach at directly accessing the JSON feed. I'm making a separate PHP script that reads in the feed and runs it through json_decode(). From there, I plan to dissect the object and map all the different key->value sets out to an array I'll feed into drupal_execute(). If this all works as I plan/hope, I'll be able to run this script with cron every day. ...thus are my plans, anyway. Mice didn't make these plans, so their value is currently in question. Thanks for the great feedback so far! I'll report back later so I don't leave the thread hanging. Thanks Dave. Paul David Metzler wrote:
Hey Paul,
Thanks for the sample. I'm assuming that you've gotten what you needed from the dev list here. As it sounds like you got your problem solved. I'm not an expert of Feed API, but I do know XML Parsing, and it would seem that if you were ever to tackle a new feed_api parser, then I've got some code that you might be interested. It's the XML to array conversion code that I alluded to in another thread. I'm hoping to have it published as a contrib in some kind of dev form by the end of the month. I'm busy extracting it from some other code that I've got that isn't really appropriate for API distribution.
Let me know if you need any more help.
Dave
On Feb 18, 2009, at 2:42 AM, Paul Hoza wrote:
uick sample of what the <summary> tag looks
-- +-----------------------------------------------+ | Paul Hoza | I do: Games, Websites, Multimedia and more. | paul@hoza.net +-----------------------------------------------+
Hi! I have written a module using XMLRPC to import data to a CCK- defined node type. I imagine that it wouldn't be too hard to modify to match your requirements. If you're interested, send me an email... Ricky On Feb 15, 2009, at 9:33 AM, Paul Hoza wrote:
Hello folks,
I've been struggling for a long time (way too long) to get a data feed imported into CCK nodes. I've attempted a plethora of different strategies, including but not limited to: FeedAPI, Feed Element Mapper, custom parsers, hacking parsers, tweaking mappers, Yahoo! Pipes to create custom feed versions of the original feed, serialized PHP exports, etc.
I'm tired of the project, but I have to find a way to get this to work.
So, I found a few articles on programmatically creating CCK nodes and I'm hoping to connect with anyone who's had experience doing this with JSON data. There is an XML/RSS version of this feed I need, but it sucks compared to the JSON version, with respect to how much data is in there and how it's formatted. The XML version hides a lot of crucial info into a <summary> element... which I might be able to parse through separately, but RSS feed aggregators just ignore stuff in there. Again, I'd have to make a custom parser to get in there.
Here's an article that hits about as close as I've seen yet. I am leaving for a couple days, so I'll try to get something like this working when I get back, but I hoped to hear from anyone who's done the same thing. Information on using JSON data to create nodes is sparse, but this article hits pretty close to the mark: https://secure.prolucid.com/node/43 I had read other posts about doing similar methods using drupal_execute(), et. al, but they all talk only about XML as data source. I haven't found anything talking about JSON or (un)serialized PHP sources.
What I really need to do is do an initial import of the JSON feed into my CCK node (which is a huge feed of 6,200+ items). After that, I want to check the feed every day for changes and create new daily nodes accordingly -- which is why FeedAPI really seemed like the ticket, aside from my massive struggles with making my own parser. For now, I'd be happy with a PHP script that I could call daily with cron.
Thanks for any feedback... sorry for the long post. Part rant, part plea. :)
Cheers, Paul Hoza
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.
participants (5)
-
Chris Johnson -
David Metzler -
Kristof Van Tomme -
Paul Hoza -
Richard Morse