I have a project in which I am being given ~ 10K RTF files, each representing one page in a dead-tree book which was scanned and then manually edited. I want to make these nodes in Drupal. I have a converter that converts them into HTML. The resultant is very close to valid HTML and is not too bad in fact.
So the task now is to import these files into Drupal. They are hierarchically stored in VolumeX/PageY.html where VolumeX is a folder and PageY.html is the HTML for that page. I have found two options for this import:
wgHTML: http://drupal.org/project/wgHTML Import HTML: http://drupal.org/project/import_html
The first doesn't actually import apparently, it just runs a real-time search for each page request. Seems non-ideal to me at first glance. The second seems reasonable, but a bit complex to configure and run.
My inclination is to try to work with the second module and see if I can get it to work and do the import once on my local Windows PHP5 machine.
Any thoughts or advice on this topic? :)
Thanks!
You might want to take a look at the Node Import module, as well: http://drupal.org/project/node_import
..chris
Chris Johnson wrote:
You might want to take a look at the Node Import module, as well: http://drupal.org/project/node_import
Thanks.