[development] Call to developers, Yet another migration module / how to choose a module name?

Sat Aug 1 15:33:52 UTC 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Pierre Rineau schrieb:
> Hello all,
> 
> Working on some custom project for my company, I developed a module to
> do massive migration between sites.
> 
> This module uses a full OO layer.
> 
> Its internal mechanism is based on abstracting objects to migrate from a
> master site to clients. This abstraction defines how to construct object
> dependency tree and how to serialize objects.
> 
> Object implementation (node, user, taxonomy, whatever) is really simple
> to use, it's only 3 methods classes (register dependencies, save, and
> update) using some kind of custom registry for developer to save/get
> back data before and after serialization.
> 
> All error handling is exception oriented, and lower software layers
> won't fail on higher layers unrecoverable errors.
> 
> Object fetching is based on a push/pull mechanism. Server push the
> sync order, client responds OK or not. If OK, it creates a job using
> DataSync module which allow it to run as CLI thread (which won't
> hurt the web server, and allow us a larger memory limit at run
> time).

I am generally not happy with datasync's approach to run shell-scripts
as the webserver user. Have you considered to use Drush instead?

> During the DataSync job execution, client will pull an original set
> of content, and browsing it will do incremental dependencies
> fetching (by pulling again server), based on xmlrpc (fetching
> component is also abstracted, and could be any other communication
> method than xmlrpc).

Wouldn't the server better be qualified to decide which data the
client needs?

> To be unobtrusive on the system, smart unset() is done after
> building a dependencies subtree, and there is a recursion breaker in
> case of circular dependencies.

Have you tried it with php 5.3?

> This module was created because the deploy module seems to be so
> unstable, I did not want to risk client's production sites to run
> with it. I started implementation of some sort of "deploy plan",
> using profile based on views, you construct a set of views, saved
> them in a profile, then all objects that these views reference will
> be synchronized.
> 
> Right now, the module fully synchronize taxonomy and content types,
> partially synchronize users (including core account information and
> passwords), and I have a small bug left to handle with nodes (revision
> problem I think).
> 
> There might be a performance or overhead problem with this
> conception with a very large amount of data, it could break
> easily.

How large is your "very large"? If I wanted to sync 10k nodes to 100
client sites, how successful would I be?

> The only way to be sure it won't break is I think to migrate stuff
> with a numerous small set of data. But the problem doing this is
> that it will be really hard to keep the transactional context of
> DataSync module.

Yeah, one reason to let the server handle this, no?

> There is a lot of other custom goodies coming.
> 
> First thing is, what do you think about such module, should I commit
> it on drupal.org? Is there people interested?

I am certainly interested, especially if my concerns from above can be
addressed. ;)

> And, now that I described the module, what name should I give him,
> considering the fact I'll probably commit it on drupal.org, if
> people are interested.
> 
> I though about "YAMM" (Yet Another Migration Module), or YADM (Yet
> Another Deployment Module).
> 
> The fact is there is *a lot* of modules which want to do the same
> thing as this one, I just want a simple an expressive name.

Data migration is an important and diverse task. IMO it doesn't hurt
to have several approaches.

Cheers,
	Gerhard
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkp0YGAACgkQfg6TFvELooSOzACfUr5q/9Eu5b8YETgXu6CNYLZN
JugAn1j8/8nlbVV55RmsP9ZLc9px35/A
=rk5A
-----END PGP SIGNATURE-----