[development] Solving the dev->staging->live problem

Mon Aug 11 18:51:31 UTC 2008

On Monday 11 August 2008 05:33:49 Victor Kane wrote:
> The serialization and unserialization of data is included in my
> approach to the problem for the purposes of the independent
> transmission of nodes from one system to another, as in the case of
> one Drupal site availing itself of a node.save service on another
> Drupal site.
>
> It also has the purpose of guaranteeing insofar as is possible a text
> version of all entities, configurations, including exported views,
> content types, panels, hopefully menues and module configurations and
> exported variables, for the purposes of continued version control and
> hence deployment also (serialization to text, unserialization to
> deployment objective).
>
> Here of course, serialization and unserialization is not meant in the
> php function sense, and could include marshaling and unmarshaling to
> and from XML, and is a cross-language concept.
>
> Victor Kane
> http://awebfactory.com.ar

So my initial reaction was that this was actually a disadvantage - it seemed 
to introduce an extra layer of unnecessary complexity, as it requires pulling 
the data out of the db, coming up with a new storage format, then transferring 
that format and reintegrating it into another db. The backup and project-level 
revisioning control implications are interesting - but that's a wholly 
different axis from the crux of the deployment paradigm, where there's _one_ 
version.

However, on further reflection, I can see there being some strong arguments in 
either direction. Your approach, Victor, makes me drift back to the recent vcs 
thread on this list, as I can't imagine such a system being feasible and 
really scalable without the use  of something like (gasp!) git. Two basic 
reasons for that: 

1. Data integrity assurance: there's nothing like a SHA1 hash to ensure that 
nothing gets corrupted in all the combining/recombining of data through and 
around various servers. And then there's the whole content-addressable 
filesystem bit - I'd conjecture that it would make git exceptionally 
proficient at handling either database dumps or tons of data structured from 
'export' functionality, whichever the case may be. I imagine Kathleen might be 
able to speak more to that. 

2. Security and logic (and speed): if run on a git backend, I'd speculate that 
we could use project-specific ssh keys to encrypt all the synchronizations 
(although that obviously brings up a host of other potential 
requirements/issues). On the logic end, we could build on top of git's system 
for organizing sets of commits to allow for different 'types' of syncs (i.e., 
you're working with different datasets when doing dev <=> qa vs. when you're 
working with live <=> staging). As for speed...well, I'll just call that a 
hunch =)

This approach would require a _lot_ of coding, though. The more immediate-term 
solution that still makes the most sense to me is one where we let modules 
work straight from the data as it exists in the db, and define some logic for 
handling synchronizations that the deploy API can manage. But if all the 
systems are to be implemented...well, then it probably means a pluggable 
Deploy API that allows for different subsystems to handle different segments 
of the overall process.

Sam