Worth asking here, as I've found no good solutions yet, and I'm sure others have faced similar issues and solved it in various ways. Despite CS shipping with node_import, and a few people posting on the Drupal forums about various times they've coded up different scripts to import old html sites, it doesn't look like there is a good and mature way to import an existing HTML based site, except for the old cut and paste approach. Ideally, give either a set of html files or an set of URLs, a script should be able to import each pages html, and make a node. Stripping out things like headers and footers would be good. Problems: node_import: requires CSV compilation of pages. How do you convert existing pages into a CSV? Script? import_export (despite 4.5 label on Drupal, was made 4.6 friendly): same problem... also does XML... import_html: http://coders.co.nz/drupal_development/?q=taxonomy/term/4 Specific to the job, but Dan hasn't released the code fully yet... in email, he mentioned it was missing bits wgHTML: an ugly hack that wraps html Drupalishly, but not a real import Other scripts: haven't found a good example yet. Got one? Thanks for any pointers... and I'll report back and post a nice howto for use in the handbook. obDevel: I just noticed a bug I filed a year ago finally closed... br tags are no longer standard on labels. http://drupal.org/node/15609 Yeah! Only took a whole year to get closed.
Add to the above: xstatic module: In use by a large Drupal site to wrap existing HTML. Unreleased despite talk about it being released. Add to the features: (optional) Upon import, create an alias for each new node with the existing URL of the page so that you do not get 404s.
xstatic module: In use by a large Drupal site to wrap existing HTML. Unreleased despite talk about it being released.
Thanks for the headsup (didn't know it existed) but it's in CVS now (4 weeks old) http://cvs.drupal.org/viewcvs/drupal/contributions/modules/xstatic/ Suffers from same fundamental problem as WGhtml: it's a hack, to display old html, but it's not creating nodes... which is the goal here - true drupal nodes as an end result. Cut and paste does that, of course. One more existing prior attempt that doesn't _quite_ fit the bill as written is importpage: http://cvs.drupal.org/viewcvs/drupal/contributions/modules/importpage/ Requires entering a URL into a form to grab a single existing page, and the node type is forced to 'book', but overall, this is likely the closest, and the code might be massagable into something more automatic...
Add to the features: (optional) Upon import, create an alias for each new node with the existing URL of the page so that you do not get 404s.
Yes, adding a path should be trivial, since file XYZ.html should get a path set of XYZ.html If the script recurses into subdirectories, pay attention to that as well. I'm looking for a equivalent of Killes' scripts in Devel module that import taxonomy terms... doesn't have to be user friendly or with a pretty interface... this is a one time thing.
I've just done a major site migration by pushing the site into subscribe.module through XML-RPC. The site I was pushing in had metadata which I mapped to vocabularies. It took some hacking but the result was pleasant as it was 1400 nodes, 700+ terms. So that's another option; probably not the easiest at this point.
On Friday 20 January 2006 23:34, John VanDyk wrote:
I've just done a major site migration by pushing the site into subscribe.module through XML-RPC. The site I was pushing in had metadata which I mapped to vocabularies. It took some hacking but the result was pleasant as it was 1400 nodes, 700+ terms. So that's another option; probably not the easiest at this point.
This sounds like an interesting option for a site I need to convert in the next several months, with about 1500 pages. I don't have embedded metadata, but the directory structure of the existing site would let me generate at least the basics. Would you consider sharing your script in the hints/tips section of the Drupal contrib/sandbox area? I'm betting a number of folks would be interesting in seeing what you've done, and maybe using the code as a starting point. I have no illusions that it would work as-is for my needs, but adapting might be quicker than coding from scratch. Kind regards, Scott -- ------------------------------------------------------------------------------- Scott Courtney Drupal user name: "syscrusher" http://drupal.org/user/9184 scott at 4th dot com Drupal projects: http://drupal.org/project/user/9184 Sandbox: http://cvs.drupal.org/viewcvs/drupal/contributions/sandbox/syscrusher
participants (4)
-
John VanDyk -
Khalid B -
Seth Cohn -
Syscrusher