Hello, In trying to fix http://drupal.org/node/191197#comment-705056 (allow to import nodes in batch), I feel a bit sorry for the status of the various import export solutions in Drupal. - importexportapi is somewhat dead - node import only accepts to import everything at once, and does not provide duplicate checking - feed parser api and feed mapper seems to be a brighter future, but do not allow everything node import / importexportapi does So I'm wondering where to put my efforts. I'm trying to import 20.000 contacts as cck nodes, and allow user to reimport the same contacts and update the existing nodes, all with a nice import/export interface. I think the first step would be to have standardized import / export format for a Drupal node, maybe xml based (I think there is a xml format for books only), with support for cck, taxonomy, and all. This is a very complex problem, but I think it's important for Drupal, as an open platform, to provide excellent import/export tools. Which module has the greatest chance to provide a solid base? Is there a standard node xml (or some other format) representation? I'd like to spend time on a somewhat future proof solution. Any help would be greatly appreciated Philippe
Phillipe and all, I don't have the development skills to help on this one... but I promise to be a tester as things get rolling. I totally support the urgency/importance of this work. +1 I was talking with Angie Byron about this at the Portland Lullabot workshop. She noted that data migration modules have a hard time staying current because developers get interested in them when they (desperately) need the functionality and then lose interest when they are finished with a large migration of data. As Drupal grows, it seems likely there would be enough work for a consulting firm to specialize in data migration to Drupal. Other Drupal shops would likely be thrilled to sub out that piece of the work. So I do think there is hope for the future. Thanks for taking leadership on this. Shai On 1/25/08, Philippe Jadin <philippe.jadin@gmail.com> wrote:
Hello,
In trying to fix http://drupal.org/node/191197#comment-705056 (allow to import nodes in batch), I feel a bit sorry for the status of the various import export solutions in Drupal.
- importexportapi is somewhat dead - node import only accepts to import everything at once, and does not provide duplicate checking - feed parser api and feed mapper seems to be a brighter future, but do not allow everything node import / importexportapi does
So I'm wondering where to put my efforts. I'm trying to import 20.000 contacts as cck nodes, and allow user to reimport the same contacts and update the existing nodes, all with a nice import/export interface.
I think the first step would be to have standardized import / export format for a Drupal node, maybe xml based (I think there is a xml format for books only), with support for cck, taxonomy, and all.
This is a very complex problem, but I think it's important for Drupal, as an open platform, to provide excellent import/export tools.
Which module has the greatest chance to provide a solid base? Is there a standard node xml (or some other format) representation? I'd like to spend time on a somewhat future proof solution.
Any help would be greatly appreciated
Philippe
In trying to fix http://drupal.org/node/191197#comment-705056 (allow to import nodes in batch), I feel a bit sorry for the status of the various import export solutions in Drupal.
- importexportapi is somewhat dead
It is. But it's the most advanced and flexible solution currently available for Drupal. Many folks have contributed to ImportExport API, however the module seems to be unmaintained currently. The API docs are in an pre-alpha state, too. So it is definitely not easy to understand, how ImportExport API works, and how it could work out for your requirements.
- node import only accepts to import everything at once, and does not provide duplicate checking
While Node Import could be enhanced to support this dupe-checking feature of ImportExport API, the main difference is that it only supports nodes. However, from my own experience I can say that importing data into (CCK-based) nodes with Node Import works like a breeze.
- feed parser api and feed mapper seems to be a brighter future, but do not allow everything node import / importexportapi does
Didn't try them yet. According to the module names, I'd say that they fulfill a different purpose. ;)
So I'm wondering where to put my efforts. I'm trying to import 20.000 contacts as cck nodes, and allow user to reimport the same contacts and update the existing nodes, all with a nice import/export interface.
ImportExport is able to check for duplicates, but you have to dig into the API to get an idea of how this could be realized.
I think the first step would be to have standardized import / export format for a Drupal node, maybe xml based (I think there is a xml format for books only), with support for cck, taxonomy, and all.
ImportExport has been built to support a variety of formats. You can import/export from text files, XML, another database, aso. - it's up to you. ImportExport "just" needs a source data definition. Daniel
On 25 Jan 2008, at 4:56 PM, Daniel F. Kudwien wrote:
It is. But it's the most advanced and flexible solution currently available for Drupal. Many folks have contributed to ImportExport API, however the module seems to be unmaintained currently. The API docs are in an pre-alpha state, too. So it is definitely not easy to understand, how ImportExport API works, and how it could work out for your requirements.
it's a query builder. just like views / cck. and it has it's own set of _schema like hooks, which is what's annoying about it. since you get to type everything out ... again. I once wrote a install profile generator for it for drupal 4.7, but i didn't have the time nor energy to update ieapi for d5
I would add to the list http://drupal.org/project/install_profile_api Particularly the crud.inc file. I've used it for a couple projects and added bits to it. Topically, I do have a node create function that works well most of the time. But it's a bit flakey depending what contrib modules are doing to the node form. I focus on passing everything in a simple keyed array format, and the function then makes it look like a node submission. The difficulty I've had is making the submitted form the right structure - the cck fields are difficult to emulate, but I'm handling many of them (text, number, nodereference, etc). If it's of any interest, here is the uncommitted code. I chopped out some hacks, so this version untested. function install_create_content($content_type, $properties) { global $user; $default = array(); $default['type'] = $content_type; $default['format'] = 0; $default['comment'] = 2; // Enable read/write comments $default['status'] = 1; // Published $default['promote'] = 0; // Not promoted to front page. $default['sticky'] = 0; // Not sticky. $default['created'] = date('g:i:sA'); $default['log'] = 'Updated at ' . date('g:i:sA') . ' via install_create_content' ; $default['name'] = $user->name; $default['uid'] = $user->uid; foreach ($properties AS $property => $value) { $field = content_fields($property, $content_type); if (isset($field['field_name'])) { switch ($field['type']) { case 'nodereference': case 'userreference': $default[$property] = array(array('nid' => $value)); break; default: if (is_array($value)) { // Uncertain. $default[$property][] = $value; } else { $default[$property] = array(array('value' => $value)); } } if ($field['widget']['type'] == 'options_select' && isset($default[$property][0]['value'])) { if ($field['multiple']) { foreach ($property[0] AS $v) { $default[$property]['keys'][$value] = $value; } } else { $default[$property]['key'] = $default[$property][0]['value']; } //unset($default[$property][0]); } unset($properties[$property]); } } $node = array_merge($default, $properties); $node = node_submit($node); node_save($node); return $node; } On 1/26/08, adrian rossouw <adrian@bryght.com> wrote:
On 25 Jan 2008, at 4:56 PM, Daniel F. Kudwien wrote:
It is. But it's the most advanced and flexible solution currently available for Drupal. Many folks have contributed to ImportExport API, however the module seems to be unmaintained currently. The API docs are in an pre-alpha state, too. So it is definitely not easy to understand, how ImportExport API works, and how it could work out for your requirements.
it's a query builder. just like views / cck.
and it has it's own set of _schema like hooks, which is what's annoying about it. since you get to type everything out ... again.
I once wrote a install profile generator for it for drupal 4.7, but i didn't have the time nor energy to update ieapi for d5
For various reasons, I also arrived at node_save()'s doorstep for data import demands of 6000+ nodes of a few different content types. Most recently had a little help from Adrian, and managed to import 10 000 nodes with the imagefield populated, as well as files table. I find that the most difficult part of data import is to understand the older website, database and files structure, the current customer requirements, and making sure that all the taxonomy and content types are defined. One little trick, which I haven't yet figured out, is why when populating the imagefield nid, I needed to use node_save twice. Node_save did not save the imagefield nid, when first creating a node. Once saved, and the node object was properly populated and passed back by reference, the second time, the imagefield nid was updated. For example, most cck fields, can be populated before node_save() by ==> $node->field_secondarytitle[0]['value'] = "some value"; $node->field_related[0]['nid'] = 123; node_save($node); For imagefield ==> node_save($node); // need the second node_save to save the imagefield $n->field_image[0]['fid'] = $fid; node_save($node); Thanks Audrey On 1/27/08, emspace.com.au@gmail.com <emspace.com.au@gmail.com> wrote:
I would add to the list http://drupal.org/project/install_profile_api Particularly the crud.inc file.
I've used it for a couple projects and added bits to it.
Topically, I do have a node create function that works well most of the time. But it's a bit flakey depending what contrib modules are doing to the node form.
I focus on passing everything in a simple keyed array format, and the function then makes it look like a node submission. The difficulty I've had is making the submitted form the right structure - the cck fields are difficult to emulate, but I'm handling many of them (text, number, nodereference, etc).
If it's of any interest, here is the uncommitted code. I chopped out some hacks, so this version untested.
function install_create_content($content_type, $properties) { global $user;
$default = array(); $default['type'] = $content_type; $default['format'] = 0; $default['comment'] = 2; // Enable read/write comments $default['status'] = 1; // Published $default['promote'] = 0; // Not promoted to front page. $default['sticky'] = 0; // Not sticky.
$default['created'] = date('g:i:sA'); $default['log'] = 'Updated at ' . date('g:i:sA') . ' via install_create_content' ;
$default['name'] = $user->name; $default['uid'] = $user->uid;
foreach ($properties AS $property => $value) { $field = content_fields($property, $content_type); if (isset($field['field_name'])) { switch ($field['type']) { case 'nodereference': case 'userreference': $default[$property] = array(array('nid' => $value)); break; default: if (is_array($value)) { // Uncertain. $default[$property][] = $value; } else { $default[$property] = array(array('value' => $value)); } } if ($field['widget']['type'] == 'options_select' && isset($default[$property][0]['value'])) { if ($field['multiple']) { foreach ($property[0] AS $v) { $default[$property]['keys'][$value] = $value; } } else { $default[$property]['key'] = $default[$property][0]['value']; } //unset($default[$property][0]); } unset($properties[$property]); } }
$node = array_merge($default, $properties); $node = node_submit($node); node_save($node); return $node; }
On 1/26/08, adrian rossouw <adrian@bryght.com> wrote:
On 25 Jan 2008, at 4:56 PM, Daniel F. Kudwien wrote:
It is. But it's the most advanced and flexible solution currently available for Drupal. Many folks have contributed to ImportExport API, however the module seems to be unmaintained currently. The API docs are in an pre-alpha state, too. So it is definitely not easy to understand, how ImportExport API works, and how it could work out for your requirements.
it's a query builder. just like views / cck.
and it has it's own set of _schema like hooks, which is what's annoying about it. since you get to type everything out ... again.
I once wrote a install profile generator for it for drupal 4.7, but i didn't have the time nor energy to update ieapi for d5
i was not asking the right question, so i will try to reiterate, maybe some developers with better knowledge of Drupal and its modules will know the answer. as you see, many people are trying to construct node array and submit it to node_save() however, would it be possible with Drupal to do something like this? $new = get_new_node('article'); $new->add('title', 'Hello world'); $new->add('userid', 1); $new->commit; basically, the new import function would only have to construct the node object and submit it to Drupal On Jan 28, 2008 11:43 PM, Audrey Foo <audrey@raincitystudios.com> wrote:
For various reasons, I also arrived at node_save()'s doorstep for data import demands of 6000+ nodes of a few different content types.
Most recently had a little help from Adrian, and managed to import 10 000 nodes with the imagefield populated, as well as files table.
I find that the most difficult part of data import is to understand the older website, database and files structure, the current customer requirements, and making sure that all the taxonomy and content types are defined.
One little trick, which I haven't yet figured out, is why when populating the imagefield nid, I needed to use node_save twice. Node_save did not save the imagefield nid, when first creating a node. Once saved, and the node object was properly populated and passed back by reference, the second time, the imagefield nid was updated.
For example, most cck fields, can be populated before node_save() by ==> $node->field_secondarytitle[0]['value'] = "some value"; $node->field_related[0]['nid'] = 123; node_save($node);
For imagefield ==> node_save($node); // need the second node_save to save the imagefield $n->field_image[0]['fid'] = $fid; node_save($node);
Thanks Audrey
On 1/27/08, emspace.com.au@gmail.com <emspace.com.au@gmail.com> wrote:
I would add to the list http://drupal.org/project/install_profile_api Particularly the crud.inc file.
I've used it for a couple projects and added bits to it.
Topically, I do have a node create function that works well most of the time. But it's a bit flakey depending what contrib modules are doing to the node form.
I focus on passing everything in a simple keyed array format, and the function then makes it look like a node submission. The difficulty I've had is making the submitted form the right structure - the cck fields are difficult to emulate, but I'm handling many of them (text, number, nodereference, etc).
If it's of any interest, here is the uncommitted code. I chopped out some hacks, so this version untested.
function install_create_content($content_type, $properties) { global $user;
$default = array(); $default['type'] = $content_type; $default['format'] = 0; $default['comment'] = 2; // Enable read/write comments $default['status'] = 1; // Published $default['promote'] = 0; // Not promoted to front page. $default['sticky'] = 0; // Not sticky.
$default['created'] = date('g:i:sA'); $default['log'] = 'Updated at ' . date('g:i:sA') . ' via install_create_content' ;
$default['name'] = $user->name; $default['uid'] = $user->uid;
foreach ($properties AS $property => $value) { $field = content_fields($property, $content_type); if (isset($field['field_name'])) { switch ($field['type']) { case 'nodereference': case 'userreference': $default[$property] = array(array('nid' => $value)); break; default: if (is_array($value)) { // Uncertain. $default[$property][] = $value; } else { $default[$property] = array(array('value' => $value)); } } if ($field['widget']['type'] == 'options_select' && isset($default[$property][0]['value'])) { if ($field['multiple']) { foreach ($property[0] AS $v) { $default[$property]['keys'][$value] = $value; } } else { $default[$property]['key'] = $default[$property][0]['value']; } //unset($default[$property][0]); } unset($properties[$property]); } }
$node = array_merge($default, $properties); $node = node_submit($node); node_save($node); return $node; }
On 1/26/08, adrian rossouw <adrian@bryght.com> wrote:
On 25 Jan 2008, at 4:56 PM, Daniel F. Kudwien wrote:
It is. But it's the most advanced and flexible solution currently available for Drupal. Many folks have contributed to ImportExport API, however the module seems to be unmaintained currently. The API docs are in an pre-alpha state, too. So it is definitely not easy to understand, how ImportExport API works, and how it could work out for your requirements.
it's a query builder. just like views / cck.
and it has it's own set of _schema like hooks, which is what's annoying about it. since you get to type everything out ... again.
I once wrote a install profile generator for it for drupal 4.7, but i didn't have the time nor energy to update ieapi for d5
Here's what I do in the Faq_Ask module: <code> $node = array( 'type' => 'faq', 'body' => '', /* Empty string rather than null. */ 'title' => $form_values['title'], 'taxonomy' => array($category => $term), 'created' => time(), 'uid' => $user->uid, 'name' => $user->name, 'status' => 0, /* Unpublished. */ 'format' => 1, /* Default filter (filtered HTML) */ 'comment' => variable_get('comment_faq', 0), ); // Okay, let's get it done. Node_submit will prepare it and make it an object. $node = node_submit($node); node_save($node); </code> Nancy E. Wichmann, PMP From: Roman Chyla
as you see, many people are trying to construct node array and submit it to node_save() however, would it be possible with Drupal to do something like this?
$new = get_new_node('article'); $new->add('title', 'Hello world'); $new->add('userid', 1); $new->commit;
On Jan 29, 2008 11:45 AM, Roman Chyla <roman.chyla@gmail.com> wrote:
however, would it be possible with Drupal to do something like this?
$new = get_new_node('article'); $new->add('title', 'Hello world'); $new->add('userid', 1); $new->commit;
basically, the new import function would only have to construct the node object and submit it to Drupal
Hi Roman There would be nothing to prevent any examples in this thread being made into a class with methods like 'add' and 'commit'. The challenge is having working code inside the black box. Simon
Hi, i am trying to have some introspection of the forms, tried various things to get the document type's form array I even serialized the array passed to this function $form = call_user_func_array('drupal_retrieve_form', $args); it works when I go to /node/add/page but if I try the same thing from the command line, eg. $args = unserialize('a:3:{i:0;s:14:"page_node_form";i:1;a:3:{s:7:"storage";N;s:9:"submitted";b:0;s:4:"post";a:0:{}}i:2;a:4:{s:3:"uid";s:1:"1";s:4:"name";s:5:"admin";s:4:"type";s:4:"page";s:8:"language";s:0:"";}}'); $form = call_user_func_array('drupal_retrieve_form', $args); print_r($form); I got completely different array Can you please point me to the right direction? I must be missing something obvious. how do i get the form tree array before it is converted to the html form? Thanx roman On Jan 29, 2008 11:42 PM, Simon Hobbs <emspace.com.au@gmail.com> wrote:
On Jan 29, 2008 11:45 AM, Roman Chyla <roman.chyla@gmail.com> wrote:
however, would it be possible with Drupal to do something like this?
$new = get_new_node('article'); $new->add('title', 'Hello world'); $new->add('userid', 1); $new->commit;
basically, the new import function would only have to construct the node object and submit it to Drupal
Hi Roman
There would be nothing to prevent any examples in this thread being made into a class with methods like 'add' and 'commit'. The challenge is having working code inside the black box.
Simon
my .02 is that importexportapi tried to solve the full range of issues associated with data migration and crumbled under the weight of such a task. i would recommend putting effort into node_import or feed api. a firm which specializes in data migration into drupal is a *really* good idea. -moshe
hi, do you think something along these lines would be possible? It is a batch script to import documents, to create objects in Eprints. In drupal, there would be probably node, user, comment... paralels. #!/usr/bin/perl -w -I/usr/share/eprints3/perl_lib # Deposit an eprint in the repository use EPrints; use strict; # Start session my $session = new EPrints::Session( 1, "et" ); exit( 0 ) unless( defined $session ); # Get archive dataset my $dataset = $session->get_repository->get_dataset( "archive" ); # Create new eprint my $eprint = EPrints::DataObj::EPrint::create( $session, $dataset ); $eprint->set_value( "title", "Hello World" ); $eprint->set_value( "creators_name", [ { family=>"Smith", given=>"John" }, { family=>"Jones", given=>"Mary" }, ] ); $eprint->set_value( "date", "2005-02-02" ); $eprint->set_value( "type", "article" ); $eprint->commit(); # Add document to eprint my $pdf; my $filename = "demo.pdf"; if( open( $pdf, $filename ) ) { my $doc = EPrints::DataObj::Document::create( $session, $eprint ); $doc->set_value( "format", "application/pdf" ); $doc->upload( $pdf, "paper.pdf" ); close $pdf; $doc->set_value( "main", "paper.pdf" ); $doc->commit; } else { print STDERR "Failed to open file: $filename: $!\n"; print STDERR "Did not create document.\n"; } # Generate abstract page for new eprint $eprint->generate_static; print "Created EPrint #" . $eprint->get_id . "\n"; # End session $session->terminate(); On Jan 25, 2008 4:07 PM, Moshe Weitzman <weitzman@tejasa.com> wrote:
my .02 is that importexportapi tried to solve the full range of issues associated with data migration and crumbled under the weight of such a task. i would recommend putting effort into node_import or feed api.
a firm which specializes in data migration into drupal is a *really* good idea.
-moshe
Import Export API was overly ambitious, and just can't be kept in sync with the many sources. Perhaps a team that maintains it would be able to revive it and keep it in sync. I have not used it myself, but I heard people sing praise for the Import HTML module. http://drupal.org/project/import_html If you can export a site to HTML (wget), and don't need "fields" (cck) then it is a viable solution to many problems. -- Khalid M. Baheyeldin 2bits.com, Inc. http://2bits.com Drupal optimization, development, customization and consulting.
Here's another project that has some high aims: http://drupal.org/project/convert2drupal No releases yet, but might be worth keeping an eye on. On Jan 25, 2008 11:27 AM, Khalid Baheyeldin <kb@2bits.com> wrote:
Import Export API was overly ambitious, and just can't be kept in sync with the many sources. Perhaps a team that maintains it would be able to revive it and keep it in sync.
I have not used it myself, but I heard people sing praise for the Import HTML module.
http://drupal.org/project/import_html
If you can export a site to HTML (wget), and don't need "fields" (cck) then it is a viable solution to many problems.
-- Khalid M. Baheyeldin 2bits.com, Inc. http://2bits.com Drupal optimization, development, customization and consulting.
There is also Migration2, By Mike Gifford http://drupal.org/node/28129 (unpublished)
Building/Extending a set of php/mysql tools to help migration to Drupal from other CMS applications.
A generic migration tool should:
1) Scan for a list of tables in the source and destination databases. 2) Allow for one or more table to be selected to select data to be inserted into the proper destination tables (this could all be done via a web interface). 3) Be smart enough to check for optional modules (like i18n) and insert the data appropriately 4) Provide a list of database columns from the selected tables in the source and destination tables (and allow admins to mix and match. 5) Spit out a report at the end to say what data was migrated (briefly) and also account for what data was not.
On Jan 25, 2008 12:38 PM, Jerad Bitner <sirkitree@gmail.com> wrote:
Here's another project that has some high aims: http://drupal.org/project/convert2drupal
No releases yet, but might be worth keeping an eye on.
On Jan 25, 2008 11:27 AM, Khalid Baheyeldin <kb@2bits.com> wrote:
Import Export API was overly ambitious, and just can't be kept in sync with the many sources. Perhaps a team that maintains it would be able to revive it and keep it in sync.
I have not used it myself, but I heard people sing praise for the Import HTML module.
http://drupal.org/project/import_html
If you can export a site to HTML (wget), and don't need "fields" (cck) then it is a viable solution to many problems.
-- Khalid M. Baheyeldin 2bits.com, Inc. http://2bits.com Drupal optimization, development, customization and consulting.
-- Khalid M. Baheyeldin 2bits.com, Inc. http://2bits.com Drupal optimization, development, customization and consulting.
participants (12)
-
adrian rossouw -
Audrey Foo -
Daniel F. Kudwien -
emspace.com.au@gmail.com -
Jerad Bitner -
Khalid Baheyeldin -
Moshe Weitzman -
Nancy Wichmann -
Philippe Jadin -
Roman Chyla -
Shai Gluskin -
Simon Hobbs