[support] HTML Import

Fred Jones fredthejonester at gmail.com
Tue Nov 13 17:45:56 UTC 2007


I have a project in which I am being given ~ 10K RTF files, each 
representing one page in a dead-tree book which was scanned and then 
manually edited. I want to make these nodes in Drupal. I have a 
converter that converts them into HTML. The resultant is very close to 
valid HTML and is not too bad in fact.

So the task now is to import these files into Drupal. They are 
hierarchically stored in VolumeX/PageY.html where VolumeX is a folder 
and PageY.html is the HTML for that page. I have found two options for 
this import:

wgHTML: http://drupal.org/project/wgHTML
Import HTML: http://drupal.org/project/import_html

The first doesn't actually import apparently, it just runs a real-time 
search for each page request. Seems non-ideal to me at first glance. The 
second seems reasonable, but a bit complex to configure and run.

My inclination is to try to work with the second module and see if I can 
get it to work and do the import once on my local Windows PHP5 machine.

Any thoughts or advice on this topic? :)

Thanks!


More information about the support mailing list