On Sat, 2009-08-01 at 17:33 +0200, Gerhard Killesreiter wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Pierre Rineau schrieb:
Hello all,
Working on some custom project for my company, I developed a module to do massive migration between sites.
This module uses a full OO layer.
Its internal mechanism is based on abstracting objects to migrate from a master site to clients. This abstraction defines how to construct object dependency tree and how to serialize objects.
Object implementation (node, user, taxonomy, whatever) is really simple to use, it's only 3 methods classes (register dependencies, save, and update) using some kind of custom registry for developer to save/get back data before and after serialization.
All error handling is exception oriented, and lower software layers won't fail on higher layers unrecoverable errors.
Object fetching is based on a push/pull mechanism. Server push the sync order, client responds OK or not. If OK, it creates a job using DataSync module which allow it to run as CLI thread (which won't hurt the web server, and allow us a larger memory limit at run time).
I am generally not happy with datasync's approach to run shell-scripts as the webserver user. Have you considered to use Drush instead?
Drush might be something to look at, but in fact it's because of the DataSync transaction support I did choose this module.
During the DataSync job execution, client will pull an original set of content, and browsing it will do incremental dependencies fetching (by pulling again server), based on xmlrpc (fetching component is also abstracted, and could be any other communication method than xmlrpc).
Wouldn't the server better be qualified to decide which data the client needs?
The server decides, it gives a transaction id to client, then the client request (at pull time) data giving its transaction id, without knowing what is coming. The whole import part is handled by client browsing a list of abstract entities without knowing what's the exact implementation.
To be unobtrusive on the system, smart unset() is done after building a dependencies subtree, and there is a recursion breaker in case of circular dependencies.
Have you tried it with php 5.3?
PHP 5.3 has too many differences with prior versions, I don't really want to support it. The fact is it's maybe already outdated because of PHP 6 devel.
This module was created because the deploy module seems to be so unstable, I did not want to risk client's production sites to run with it. I started implementation of some sort of "deploy plan", using profile based on views, you construct a set of views, saved them in a profile, then all objects that these views reference will be synchronized.
Right now, the module fully synchronize taxonomy and content types, partially synchronize users (including core account information and passwords), and I have a small bug left to handle with nodes (revision problem I think).
There might be a performance or overhead problem with this conception with a very large amount of data, it could break easily.
How large is your "very large"? If I wanted to sync 10k nodes to 100 client sites, how successful would I be?
I can't tell you that right now, I'm in active development and only test on small amount of data (something about ten nodes). I need to test and benchmark it to discover its limits, it's at an early development stage right now.
The only way to be sure it won't break is I think to migrate stuff with a numerous small set of data. But the problem doing this is that it will be really hard to keep the transactional context of DataSync module.
Yeah, one reason to let the server handle this, no?
There is a lot of other custom goodies coming.
First thing is, what do you think about such module, should I commit it on drupal.org? Is there people interested?
I am certainly interested, especially if my concerns from above can be addressed. ;)
And, now that I described the module, what name should I give him, considering the fact I'll probably commit it on drupal.org, if people are interested.
I though about "YAMM" (Yet Another Migration Module), or YADM (Yet Another Deployment Module).
The fact is there is *a lot* of modules which want to do the same thing as this one, I just want a simple an expressive name.
Data migration is an important and diverse task. IMO it doesn't hurt to have several approaches.
Cheers, Gerhard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux)
iEYEARECAAYFAkp0YGAACgkQfg6TFvELooSOzACfUr5q/9Eu5b8YETgXu6CNYLZN JugAn1j8/8nlbVV55RmsP9ZLc9px35/A =rk5A -----END PGP SIGNATURE-----
Pierre.