Re: [development] Call to developers, Yet another migration module / how to choose a module name?

1 Aug 2009


      On Sat, 2009-08-01 at 17:33 +0200, Gerhard Killesreiter wrote:
...
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Pierre Rineau schrieb:
...
Hello all,
Working on some custom project for my company, I developed a module to
do massive migration between sites.
This module uses a full OO layer.
Its internal mechanism is based on abstracting objects to migrate from a
master site to clients. This abstraction defines how to construct object
dependency tree and how to serialize objects.
Object implementation (node, user, taxonomy, whatever) is really simple
to use, it's only 3 methods classes (register dependencies, save, and
update) using some kind of custom registry for developer to save/get
back data before and after serialization.
All error handling is exception oriented, and lower software layers
won't fail on higher layers unrecoverable errors.
Object fetching is based on a push/pull mechanism. Server push the
sync order, client responds OK or not. If OK, it creates a job using
DataSync module which allow it to run as CLI thread (which won't
hurt the web server, and allow us a larger memory limit at run
time).
I am generally not happy with datasync's approach to run shell-scripts
as the webserver user. Have you considered to use Drush instead?
Drush might be something to look at, but in fact it's because of the
DataSync transaction support I did choose this module.
...
...
During the DataSync job execution, client will pull an original set
of content, and browsing it will do incremental dependencies
fetching (by pulling again server), based on xmlrpc (fetching
component is also abstracted, and could be any other communication
method than xmlrpc).
Wouldn't the server better be qualified to decide which data the
client needs?
The server decides, it gives a transaction id to client, then the client
request (at pull time) data giving its transaction id, without knowing
what is coming. The whole import part is handled by client browsing a
list of abstract entities without knowing what's the exact
implementation.
...
...
To be unobtrusive on the system, smart unset() is done after
building a dependencies subtree, and there is a recursion breaker in
case of circular dependencies.
Have you tried it with php 5.3?
PHP 5.3 has too many differences with prior versions, I don't really
want to support it. The fact is it's maybe already outdated because of
PHP 6 devel.
...
...
This module was created because the deploy module seems to be so
unstable, I did not want to risk client's production sites to run
with it. I started implementation of some sort of "deploy plan",
using profile based on views, you construct a set of views, saved
them in a profile, then all objects that these views reference will
be synchronized.
Right now, the module fully synchronize taxonomy and content types,
partially synchronize users (including core account information and
passwords), and I have a small bug left to handle with nodes (revision
problem I think).
There might be a performance or overhead problem with this
conception with a very large amount of data, it could break
easily.
How large is your "very large"? If I wanted to sync 10k nodes to 100
client sites, how successful would I be?
I can't tell you that right now, I'm in active development and only test
on small amount of data (something about ten nodes).

I need to test and benchmark it to discover its limits, it's at an early
development stage right now.
...
...
The only way to be sure it won't break is I think to migrate stuff
with a numerous small set of data. But the problem doing this is
that it will be really hard to keep the transactional context of
DataSync module.
Yeah, one reason to let the server handle this, no?
...
There is a lot of other custom goodies coming.
First thing is, what do you think about such module, should I commit
it on drupal.org? Is there people interested?
I am certainly interested, especially if my concerns from above can be
addressed. ;)
...
And, now that I described the module, what name should I give him,
considering the fact I'll probably commit it on drupal.org, if
people are interested.
I though about "YAMM" (Yet Another Migration Module), or YADM (Yet
Another Deployment Module).
The fact is there is *a lot* of modules which want to do the same
thing as this one, I just want a simple an expressive name.
Data migration is an important and diverse task. IMO it doesn't hurt
to have several approaches.
Cheers,
  Gerhard
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iEYEARECAAYFAkp0YGAACgkQfg6TFvELooSOzACfUr5q/9Eu5b8YETgXu6CNYLZN
JugAn1j8/8nlbVV55RmsP9ZLc9px35/A
=rk5A
-----END PGP SIGNATURE-----
Pierre.