Fred Jones wrote:
I am working on importing a large set of HTML pages. There are about 12K pages organized into 40 volumes. Each volume it seems will average about 300 pages. I am working with the Import HTML module to get them into Drupal, and thus far, initial tests look good.
I have a few questions how to structure the data however:
- How can I create 'next' and 'prev' links to allow the user to move
easily between pages?
check out the pageroute module: http://drupal.org/project/pageroute and the paging module: http://drupal.org/project/paging
Also I've never used the book content type, but would it be conceivable to load each of these as pages in a book?
- With regard to images, there are 'image references' in the import
files which refer to images named the same as the HTML file. These references are not HTML--they are plain text like <Image Reference Vol 02 Page 8>. I was thinking to do the import and then make a script to run through the imported data and convert those to IMG tags. I suppose I could also accomplish that in real time on each page request, but a one-time script seems to make more sense to me.
That sounds like the way to do it. Does the html import create image nodes for images referenced in the import? If so that would probably be better, and in that case you would want to run the script before importing. Actually you would likely want to run the script before importing anyway as a shell script run on files would be faster and less prone to errors than a php script run on the database. It's a lot easier to create a copy of the folder for backup than it is to reimport if something goes wrong in the script.