[development] Call to developers, Yet another migration module / how to choose a module name?

Sat Aug 1 15:43:25 UTC 2009

On Sat, 2009-08-01 at 17:33 +0200, Gerhard Killesreiter wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Pierre Rineau schrieb:
> > Hello all,
> > 
> > Working on some custom project for my company, I developed a module to
> > do massive migration between sites.
> > 
> > This module uses a full OO layer.
> > 
> > Its internal mechanism is based on abstracting objects to migrate from a
> > master site to clients. This abstraction defines how to construct object
> > dependency tree and how to serialize objects.
> > 
> > Object implementation (node, user, taxonomy, whatever) is really simple
> > to use, it's only 3 methods classes (register dependencies, save, and
> > update) using some kind of custom registry for developer to save/get
> > back data before and after serialization.
> > 
> > All error handling is exception oriented, and lower software layers
> > won't fail on higher layers unrecoverable errors.
> > 
> > Object fetching is based on a push/pull mechanism. Server push the
> > sync order, client responds OK or not. If OK, it creates a job using
> > DataSync module which allow it to run as CLI thread (which won't
> > hurt the web server, and allow us a larger memory limit at run
> > time).
> 
> I am generally not happy with datasync's approach to run shell-scripts
> as the webserver user. Have you considered to use Drush instead?

Drush might be something to look at, but in fact it's because of the
DataSync transaction support I did choose this module.

> > During the DataSync job execution, client will pull an original set
> > of content, and browsing it will do incremental dependencies
> > fetching (by pulling again server), based on xmlrpc (fetching
> > component is also abstracted, and could be any other communication
> > method than xmlrpc).
> 
> Wouldn't the server better be qualified to decide which data the
> client needs?

The server decides, it gives a transaction id to client, then the client
request (at pull time) data giving its transaction id, without knowing
what is coming. The whole import part is handled by client browsing a
list of abstract entities without knowing what's the exact
implementation.

> > To be unobtrusive on the system, smart unset() is done after
> > building a dependencies subtree, and there is a recursion breaker in
> > case of circular dependencies.
> 
> Have you tried it with php 5.3?

PHP 5.3 has too many differences with prior versions, I don't really
want to support it. The fact is it's maybe already outdated because of
PHP 6 devel.

> > This module was created because the deploy module seems to be so
> > unstable, I did not want to risk client's production sites to run
> > with it. I started implementation of some sort of "deploy plan",
> > using profile based on views, you construct a set of views, saved
> > them in a profile, then all objects that these views reference will
> > be synchronized.
> > 
> > Right now, the module fully synchronize taxonomy and content types,
> > partially synchronize users (including core account information and
> > passwords), and I have a small bug left to handle with nodes (revision
> > problem I think).
> > 
> > There might be a performance or overhead problem with this
> > conception with a very large amount of data, it could break
> > easily.
> 
> How large is your "very large"? If I wanted to sync 10k nodes to 100
> client sites, how successful would I be?

I can't tell you that right now, I'm in active development and only test
on small amount of data (something about ten nodes).

I need to test and benchmark it to discover its limits, it's at an early
development stage right now.

> > The only way to be sure it won't break is I think to migrate stuff
> > with a numerous small set of data. But the problem doing this is
> > that it will be really hard to keep the transactional context of
> > DataSync module.
> 
> Yeah, one reason to let the server handle this, no?
> 
> > There is a lot of other custom goodies coming.
> > 
> > First thing is, what do you think about such module, should I commit
> > it on drupal.org? Is there people interested?
> 
> I am certainly interested, especially if my concerns from above can be
> addressed. ;)
> 
> > And, now that I described the module, what name should I give him,
> > considering the fact I'll probably commit it on drupal.org, if
> > people are interested.
> > 
> > I though about "YAMM" (Yet Another Migration Module), or YADM (Yet
> > Another Deployment Module).
> > 
> > The fact is there is *a lot* of modules which want to do the same
> > thing as this one, I just want a simple an expressive name.
> 
> Data migration is an important and diverse task. IMO it doesn't hurt
> to have several approaches.
> 
> Cheers,
> 	Gerhard
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> 
> iEYEARECAAYFAkp0YGAACgkQfg6TFvELooSOzACfUr5q/9Eu5b8YETgXu6CNYLZN
> JugAn1j8/8nlbVV55RmsP9ZLc9px35/A
> =rk5A
> -----END PGP SIGNATURE-----

Pierre.