[development] Solving the dev->staging->live problem

Mon Aug 11 21:36:08 UTC 2008

You make a lot of good points. I guess from here on in, this
discussion should move to the prototype arena, and working together
with Greg, Alex, and others to see what progress can be made.

Victor Kane

On Mon, Aug 11, 2008 at 6:09 PM, Sam Boyer <drupal at samboyer.org> wrote:
> On Monday 11 August 2008 14:38:27 Victor Kane wrote:
>> Your points are interestng, but I think there may be a lot to what
>> Greg Dunlap is recommending in terms of thinking in Drupal logic terms
>> and not database terms.
>>
>> The image I have in my mind is that the database is kind of a
>> two-dimensional projection of three dimensions; that is, there may be
>> many hidden relationships in the process side of things that have to
>> be taken into account for deployment, especially since the database is
>> usually considered in purely MySql terms, that is, with fully
>> transparent relationships between tables.
>>
>> But given concrete client driven circumstances, of course, in a given
>> instance with a given set of priorities, I can easily see how what you
>> are saying could make sense.
>>
>> Victor
>
> At this point then, I suspect we're talking past each other. First, are you
> referring only to the direct-database sync point that I made, or is that also
> in reference to the drupal-object exporting, git-managed system? If it's the
> latter as well as the former, then we've _really_ diverged.
>
> Just because I'm suggesting that there are direct-database syncing solutions
> possible does NOT mean that I'm not thinking in Drupal logic terms. Let me try
> a different, potentially clearer way:
>
> In your approach, it seems to me that the goal is to use existing drupal API
> functions - let's use nodes, so node_load() - to fully build a node object.
> That object is then exported to code, at which point we can do things with
> vcs, etc. During a deployment, we parse in that code and utilize the API
> counterpart to our data-getter function, so node_save(), to push the
> information into the database. Before node_save() is actually run, however,
> we can use something like a GUID system to change primary keys as necessary so
> that we've got the node from server A going into the right associated node on
> server B.
>
> As I said, I can see arguments for that, particularly if it utilizes a git
> backend. And only differs from what I'm proposing in a few ways.
>
> What I'm suggesting stems from Greg's original point about a GUID for every
> 'thing'. 'Things' being, to use your words:
>
> On Monday 11 August 2008 05:33:49 Victor Kane wrote:
>> ...
>> all entities, configurations, including exported views, content types,
>> panels, hopefully menues and module configurations and
>> exported variables
>> ...
>
> Because I agree completely that the _only_ way to arrive at a solution that
> works for drupal is to think in terms of the 'things' (I'm going to say
> 'items' from here on out) it makes - not to try to just grab bits of data from
> here and there and hope it all lines up on the other end. I've always thought
> that, and am pretty sure I always will.
>
> My proposal is for a deploy API that would let modules define what these items
> are, and then define a set of behaviors for managing the deployment of those
> items across a variety of different circumstances. For our concrete example, I
> suspect the _first_ thing I'd do in the synchro handler for nodes is to call
> node_load(), then follow it up with some internal logic that...I dunno, there
> are a lot of ways we could go from there. It could dynamically construct a
> delta from the last sync time with the remote server; it could just fire over
> the whole node object. If extension modules need to do something that the node
> module's deploy handler couldn't work out, no problem - those modules just
> need to register their interest in deploy transactions related to the GUID for
> that particular item. On the receiving end, the node module's deploy API
> implementation knows what to expect coming through the pipe and handles it
> accordingly - maybe through node_save(), maybe not.
>
> The advantage here, as I see it, is the potential to drill down _very_ quickly
> on exactly what should be checked during a given changeset. Very quickly as
> in, potentially, a single query. I can't picture the schema for the deploy
> items table, so it may take more, but it could be as simple as a single SELECT
> query that grabs all the items which have been changed/created since the last
> deploy txn, and that that particular deploy txn is interested in (again, a dev
> <=> qa txn != staging <=> live txn), and then it's a simple question of
> iterating through modules that have something to say about how each of the
> items is deployed.
>
> As I said, I can see ways that a git-driven system can probably provide
> similar speed when it comes to drilling down to what items need to be
> considered in a given txn; also, a git-driven system has the added benefit of
> being able to, even when your local system offline, still provide the entire
> version history on demand for _each server_ you've ever connected with on that
> project. Well, assuming your remote git branches are up to date.
>
> As far as I can tell, this is really the kind of thing you're talking about
> when you say:
>> ...the database is kind of a two-dimensional projection of three dimensions
>> that is, there may be many hidden relationships in the process side of
>> things that have to be taken into account for deployment...
>
> I can think of two very different ways of interpreting that metaphor, both of
> which are applicable to the topic at hand. I'm hoping, though, that this
> explanation finally does make clear that I'm _not_ thinking along the lines of
> 'how do we make an sqldump better?', but instead about methods for making
> deployment a process that's as smart about drupal data as drupal itself is.
>
> s
>
>>
>> On Mon, Aug 11, 2008 at 3:51 PM, Sam Boyer <drupal at samboyer.org> wrote:
>> > On Monday 11 August 2008 05:33:49 Victor Kane wrote:
>> >> The serialization and unserialization of data is included in my
>> >> approach to the problem for the purposes of the independent
>> >> transmission of nodes from one system to another, as in the case of
>> >> one Drupal site availing itself of a node.save service on another
>> >> Drupal site.
>> >>
>> >> It also has the purpose of guaranteeing insofar as is possible a text
>> >> version of all entities, configurations, including exported views,
>> >> content types, panels, hopefully menues and module configurations and
>> >> exported variables, for the purposes of continued version control and
>> >> hence deployment also (serialization to text, unserialization to
>> >> deployment objective).
>> >>
>> >> Here of course, serialization and unserialization is not meant in the
>> >> php function sense, and could include marshaling and unmarshaling to
>> >> and from XML, and is a cross-language concept.
>> >>
>> >> Victor Kane
>> >> http://awebfactory.com.ar
>> >
>> > So my initial reaction was that this was actually a disadvantage - it
>> > seemed to introduce an extra layer of unnecessary complexity, as it
>> > requires pulling the data out of the db, coming up with a new storage
>> > format, then transferring that format and reintegrating it into another
>> > db. The backup and project-level revisioning control implications are
>> > interesting - but that's a wholly different axis from the crux of the
>> > deployment paradigm, where there's _one_ version.
>> >
>> > However, on further reflection, I can see there being some strong
>> > arguments in either direction. Your approach, Victor, makes me drift back
>> > to the recent vcs thread on this list, as I can't imagine such a system
>> > being feasible and really scalable without the use  of something like
>> > (gasp!) git. Two basic reasons for that:
>> >
>> > 1. Data integrity assurance: there's nothing like a SHA1 hash to ensure
>> > that nothing gets corrupted in all the combining/recombining of data
>> > through and around various servers. And then there's the whole
>> > content-addressable filesystem bit - I'd conjecture that it would make
>> > git exceptionally proficient at handling either database dumps or tons of
>> > data structured from 'export' functionality, whichever the case may be. I
>> > imagine Kathleen might be able to speak more to that.
>> >
>> > 2. Security and logic (and speed): if run on a git backend, I'd speculate
>> > that we could use project-specific ssh keys to encrypt all the
>> > synchronizations (although that obviously brings up a host of other
>> > potential
>> > requirements/issues). On the logic end, we could build on top of git's
>> > system for organizing sets of commits to allow for different 'types' of
>> > syncs (i.e., you're working with different datasets when doing dev <=> qa
>> > vs. when you're working with live <=> staging). As for speed...well, I'll
>> > just call that a hunch =)
>> >
>> > This approach would require a _lot_ of coding, though. The more
>> > immediate-term solution that still makes the most sense to me is one
>> > where we let modules work straight from the data as it exists in the db,
>> > and define some logic for handling synchronizations that the deploy API
>> > can manage. But if all the systems are to be implemented...well, then it
>> > probably means a pluggable Deploy API that allows for different
>> > subsystems to handle different segments of the overall process.
>> >
>> > Sam
>
>