[development] Call to developers, Yet another migration module / how to choose a module name?
Pierre Rineau
pierre.rineau at makina-corpus.com
Sat Aug 1 15:43:25 UTC 2009
On Sat, 2009-08-01 at 17:33 +0200, Gerhard Killesreiter wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Pierre Rineau schrieb:
> > Hello all,
> >
> > Working on some custom project for my company, I developed a module to
> > do massive migration between sites.
> >
> > This module uses a full OO layer.
> >
> > Its internal mechanism is based on abstracting objects to migrate from a
> > master site to clients. This abstraction defines how to construct object
> > dependency tree and how to serialize objects.
> >
> > Object implementation (node, user, taxonomy, whatever) is really simple
> > to use, it's only 3 methods classes (register dependencies, save, and
> > update) using some kind of custom registry for developer to save/get
> > back data before and after serialization.
> >
> > All error handling is exception oriented, and lower software layers
> > won't fail on higher layers unrecoverable errors.
> >
> > Object fetching is based on a push/pull mechanism. Server push the
> > sync order, client responds OK or not. If OK, it creates a job using
> > DataSync module which allow it to run as CLI thread (which won't
> > hurt the web server, and allow us a larger memory limit at run
> > time).
>
> I am generally not happy with datasync's approach to run shell-scripts
> as the webserver user. Have you considered to use Drush instead?
Drush might be something to look at, but in fact it's because of the
DataSync transaction support I did choose this module.
> > During the DataSync job execution, client will pull an original set
> > of content, and browsing it will do incremental dependencies
> > fetching (by pulling again server), based on xmlrpc (fetching
> > component is also abstracted, and could be any other communication
> > method than xmlrpc).
>
> Wouldn't the server better be qualified to decide which data the
> client needs?
The server decides, it gives a transaction id to client, then the client
request (at pull time) data giving its transaction id, without knowing
what is coming. The whole import part is handled by client browsing a
list of abstract entities without knowing what's the exact
implementation.
> > To be unobtrusive on the system, smart unset() is done after
> > building a dependencies subtree, and there is a recursion breaker in
> > case of circular dependencies.
>
> Have you tried it with php 5.3?
PHP 5.3 has too many differences with prior versions, I don't really
want to support it. The fact is it's maybe already outdated because of
PHP 6 devel.
> > This module was created because the deploy module seems to be so
> > unstable, I did not want to risk client's production sites to run
> > with it. I started implementation of some sort of "deploy plan",
> > using profile based on views, you construct a set of views, saved
> > them in a profile, then all objects that these views reference will
> > be synchronized.
> >
> > Right now, the module fully synchronize taxonomy and content types,
> > partially synchronize users (including core account information and
> > passwords), and I have a small bug left to handle with nodes (revision
> > problem I think).
> >
> > There might be a performance or overhead problem with this
> > conception with a very large amount of data, it could break
> > easily.
>
> How large is your "very large"? If I wanted to sync 10k nodes to 100
> client sites, how successful would I be?
I can't tell you that right now, I'm in active development and only test
on small amount of data (something about ten nodes).
I need to test and benchmark it to discover its limits, it's at an early
development stage right now.
> > The only way to be sure it won't break is I think to migrate stuff
> > with a numerous small set of data. But the problem doing this is
> > that it will be really hard to keep the transactional context of
> > DataSync module.
>
> Yeah, one reason to let the server handle this, no?
>
> > There is a lot of other custom goodies coming.
> >
> > First thing is, what do you think about such module, should I commit
> > it on drupal.org? Is there people interested?
>
> I am certainly interested, especially if my concerns from above can be
> addressed. ;)
>
> > And, now that I described the module, what name should I give him,
> > considering the fact I'll probably commit it on drupal.org, if
> > people are interested.
> >
> > I though about "YAMM" (Yet Another Migration Module), or YADM (Yet
> > Another Deployment Module).
> >
> > The fact is there is *a lot* of modules which want to do the same
> > thing as this one, I just want a simple an expressive name.
>
> Data migration is an important and diverse task. IMO it doesn't hurt
> to have several approaches.
>
> Cheers,
> Gerhard
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAkp0YGAACgkQfg6TFvELooSOzACfUr5q/9Eu5b8YETgXu6CNYLZN
> JugAn1j8/8nlbVV55RmsP9ZLc9px35/A
> =rk5A
> -----END PGP SIGNATURE-----
Pierre.
More information about the development
mailing list