[development] The right way to manage Drupal webiste development

Fri Aug 17 17:50:47 UTC 2007

Michel Levy-Provençal wrote:
> 1/ When we make some changes using the admin interface on a developer
> instance, we have to extract the DB and restore it into the
> integration instance... At this time each developper must import the
> integration DB in his instance each time there's an update. It's a
> huge work to determine what table is a content table, param table or
> system table...
> 
> 2/ When our sites will be online, we will need to continue the
> development. The only way we found to manage this is to lock param
> tables on the live instance. And when we'll want to update the live
> website, we'll have to :
>  - update the php code (thanks svn)
>  - and (this is quite touchy) update the param tables using a sql import....

This is an age old problem - how to merge database tables from multiple
developers each working with their own instance of the database.

Prior to deployment the problem is trivial.  A staging server houses the
'master' database which has the canonical data/configuration.  If
developers want an up to date configuration in their development
environment, they pull it from the staging server work on it and
manually reproduce tested changes to the staging server.  No attention
needs to be paid to a live instance of a site, since it doesn't exit.

This becomes complicated when a site goes live however.  The staging
server may still (from a development testing standpoint) contain the
desired data/configuration - BUT - the truth is the live site (from a
practical standpoint) *is canon*.  How does one effectively push/merge
changes to a live server that may now have dozens of administrators
acting on and changing the configuration in real time.  Or for that
matter pull/merge the changes from the live site.

Some configuration and content data can easily be separated, but there
are significant numbers of tables that blur the line.  CCK tables are a
prime example of this.  The tables for content types and their fields
define the content type, its configuration and store the data.  To add
to the problem the table structures can change at a moments notice if a
content type has a new field added to it.

One partial solution to this sort of problem is to regularly dump the
structures of the canonical database tables to files - and check them
into the repository.  The database structure check ins can be automated,
and scripts can be run during developer *code* check in to diff the
database structures. This will catch new/removed tables and changes to
their structure but offers no means of automating data merges.
Developer diligence is still required to merge data.

A similar approach can be used in conjunction with the database
structure check ins - where the actual data from *known* configuration
tables is checked in to the repository.  e.g. the system, variable or
permission tables.  In the case of variable or permission though you
must have some sort of test in place to ensure that the data is
consistent with the code (fixing a typo in a perm name in hook_perm has
consequences in a live environment).

Anyway - these are just some of my random thoughts on the topic.
Perhaps this discussion can move to g.d.o.  Maybe there are some new
ideas on separation of configuration and data or automated test
strategies to ensure code, data and configuration integrity that others
want to share.

andre