I am working on importing a large set of HTML pages. There are about 12K pages organized into 40 volumes. Each volume it seems will average about 300 pages. I am working with the Import HTML module to get them into Drupal, and thus far, initial tests look good.
I have a few questions how to structure the data however:
1. How can I create 'next' and 'prev' links to allow the user to move easily between pages?
2. Shall I use one taxonomy vocabulary with 40 entries to label each page as to which volume it belongs to?
3. With regard to images, there are 'image references' in the import files which refer to images named the same as the HTML file. These references are not HTML--they are plain text like <Image Reference Vol 02 Page 8>. I was thinking to do the import and then make a script to run through the imported data and convert those to IMG tags. I suppose I could also accomplish that in real time on each page request, but a one-time script seems to make more sense to me.
Any input is appreciated. :)
Thanks, Fred
Fred Jones wrote:
I am working on importing a large set of HTML pages. There are about 12K pages organized into 40 volumes. Each volume it seems will average about 300 pages. I am working with the Import HTML module to get them into Drupal, and thus far, initial tests look good.
I have a few questions how to structure the data however:
- How can I create 'next' and 'prev' links to allow the user to move
easily between pages?
check out the pageroute module: http://drupal.org/project/pageroute and the paging module: http://drupal.org/project/paging
Also I've never used the book content type, but would it be conceivable to load each of these as pages in a book?
- With regard to images, there are 'image references' in the import
files which refer to images named the same as the HTML file. These references are not HTML--they are plain text like <Image Reference Vol 02 Page 8>. I was thinking to do the import and then make a script to run through the imported data and convert those to IMG tags. I suppose I could also accomplish that in real time on each page request, but a one-time script seems to make more sense to me.
That sounds like the way to do it. Does the html import create image nodes for images referenced in the import? If so that would probably be better, and in that case you would want to run the script before importing. Actually you would likely want to run the script before importing anyway as a shell script run on files would be faster and less prone to errors than a php script run on the database. It's a lot easier to create a copy of the folder for backup than it is to reimport if something goes wrong in the script.
Thank you to those who replied. My response:
check out the pageroute module: http://drupal.org/project/pageroute
Now this looks interesting--I will check it out.
and the paging module: http://drupal.org/project/paging
This I don't need, as the pages are already broken up--one page per HTML file. :)
Also I've never used the book content type, but would it be conceivable to load each of these as pages in a book?
I was thinking of the same thing. I had forgotten where this is as I have never used it, but in reality, this is precisely what I need as it allows me to create chapters and then pages within those chapters in a sequence:
http://drupal.org/handbook/modules/book
That sounds like the way to do it. Does the html import create image nodes for images referenced in the import? If so that would probably be better, and in that case you would want to run the script before importing. Actually you would likely want to run the script before importing anyway as a shell script run on files would be faster and less prone to errors than a php script run on the database. It's a lot easier to create a copy of the folder for backup than it is to reimport if something goes wrong in the script.
Hmm, this is interesting advice. The 'rollback' issue doesn't exist in actuality b/c anyway this is a fresh DB, but it might well be easier to run a script on the files than on the DB. I will consider this.
Maybe use CCK[2] and its image field[3]?
Good idea--I have used these before--I will see how this will work. I think in reality that I will just upload all the JPGs en masse and setup links right in the node HTML code to them--this should be easy.
I suggest Node Relativity[1] to control parent/child relationships.
Seems like using Book should handle this.
Thanks, Fred
Quoting Fred Jones fredthejonester@gmail.com:
I am working on importing a large set of HTML pages. There are about 12K pages organized into 40 volumes. Each volume it seems will average about 300 pages. I am working with the Import HTML module to get them into Drupal, and thus far, initial tests look good.
I have a few questions how to structure the data however:
- How can I create 'next' and 'prev' links to allow the user to move
easily between pages?
I suggest Node Relativity[1] to control parent/child relationships.
- Shall I use one taxonomy vocabulary with 40 entries to label each
page as to which volume it belongs to?
It might be beneficial. But still wouldn't control the parent/child relationship.
- With regard to images, there are 'image references' in the import
files which refer to images named the same as the HTML file. These references are not HTML--they are plain text like <Image Reference Vol 02 Page 8>. I was thinking to do the import and then make a script to run through the imported data and convert those to IMG tags. I suppose I could also accomplish that in real time on each page request, but a one-time script seems to make more sense to me.
Maybe use CCK[2] and its image field[3]?
[1] http://drupal.org/project/relativity [2] http://drupal.org/project/cck [3] http://drupal.org/project/imagefield
Earnie -- http://for-my-kids.com/ -- http://give-me-an-offer.com/