-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Pierre Rineau schrieb:
Hello all,
Working on some custom project for my company, I developed a module to do massive migration between sites.
This module uses a full OO layer.
Its internal mechanism is based on abstracting objects to migrate from a master site to clients. This abstraction defines how to construct object dependency tree and how to serialize objects.
Object implementation (node, user, taxonomy, whatever) is really simple to use, it's only 3 methods classes (register dependencies, save, and update) using some kind of custom registry for developer to save/get back data before and after serialization.
All error handling is exception oriented, and lower software layers won't fail on higher layers unrecoverable errors.
Object fetching is based on a push/pull mechanism. Server push the sync order, client responds OK or not. If OK, it creates a job using DataSync module which allow it to run as CLI thread (which won't hurt the web server, and allow us a larger memory limit at run time).
I am generally not happy with datasync's approach to run shell-scripts as the webserver user. Have you considered to use Drush instead?
During the DataSync job execution, client will pull an original set of content, and browsing it will do incremental dependencies fetching (by pulling again server), based on xmlrpc (fetching component is also abstracted, and could be any other communication method than xmlrpc).
Wouldn't the server better be qualified to decide which data the client needs?
To be unobtrusive on the system, smart unset() is done after building a dependencies subtree, and there is a recursion breaker in case of circular dependencies.
Have you tried it with php 5.3?
This module was created because the deploy module seems to be so unstable, I did not want to risk client's production sites to run with it. I started implementation of some sort of "deploy plan", using profile based on views, you construct a set of views, saved them in a profile, then all objects that these views reference will be synchronized.
Right now, the module fully synchronize taxonomy and content types, partially synchronize users (including core account information and passwords), and I have a small bug left to handle with nodes (revision problem I think).
There might be a performance or overhead problem with this conception with a very large amount of data, it could break easily.
How large is your "very large"? If I wanted to sync 10k nodes to 100 client sites, how successful would I be?
The only way to be sure it won't break is I think to migrate stuff with a numerous small set of data. But the problem doing this is that it will be really hard to keep the transactional context of DataSync module.
Yeah, one reason to let the server handle this, no?
There is a lot of other custom goodies coming.
First thing is, what do you think about such module, should I commit it on drupal.org? Is there people interested?
I am certainly interested, especially if my concerns from above can be addressed. ;)
And, now that I described the module, what name should I give him, considering the fact I'll probably commit it on drupal.org, if people are interested.
I though about "YAMM" (Yet Another Migration Module), or YADM (Yet Another Deployment Module).
The fact is there is *a lot* of modules which want to do the same thing as this one, I just want a simple an expressive name.
Data migration is an important and diverse task. IMO it doesn't hurt to have several approaches. Cheers, Gerhard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkp0YGAACgkQfg6TFvELooSOzACfUr5q/9Eu5b8YETgXu6CNYLZN JugAn1j8/8nlbVV55RmsP9ZLc9px35/A =rk5A -----END PGP SIGNATURE-----