[development] Separating configuration vs data tables

Sat Jun 10 02:27:30 UTC 2006

Moshe Weitzman wrote:
>> My suggestion is to identify which database tables contain 
>> administrative or configuration data vs. which tables contain content 
>> and then store them in separate databases (brief research indicates 
>> that MySQL supports cross-database queries but Postgres does not, so 
>> this functionality would be turned off with Postgres).  Ideally, it 
>> would then be possible to clone both databases, add modules or change 
>> settings in the sandbox until happy, and then copy back only the 
>> admin/config database and any filesystem mods.  In one step, the 
>> production system would be upgraded.
>>
>> We would have to decide which tables fall into each category and 
>> possibly refactor some tables if they span both categories.  Perhaps 
>> it is not possible to separate them all cleanly in this way, but it 
>> seems worth trying.
>>
> 
> right and right. it would be outstanding if we could pull this off. 
> Migrating changes to production is a major headache, and this would make 
> it really easy to roll forward and even roll back.

> yes, that prefix trick is very cool and requires no changes from core or 
> contrib. but i have a better idea. Lets store this mapping in a new 
> configuration table. Then, modules or the installer can write into this 
> table. we just need to run this pseudocode in settings.php where the 
> $db_prefix stuff is now:
> 
> // set prefixes for each table as specified by the prefixes table
> $sql = SELECT prefix, name FROM prefixes;
> $result = db_query($sql);
> while ($row = db_fetch_object($result) {
>   $db_prefix[$row->name] = $row->prefix;
> }
> 
> from there on, queries will always hit the right DB.
> 
> as a next step, i sigest identifying which tables go where and resolving 
> splitting up ambigous tables

Good idea!  Below, some thoughts on making it better.  First, a bit of 
clarification on Postgres vs. MySQL.

Postgres has more capabilities than MySQL, and follows the SQL standard more 
closely.  What MySQL lacks is the 3rd tier of the object hierarchy in standard 
SQL.  Standard SQL specifies a database object by a triplet usually named 
<database>.<schema>.<object> or sometimes <database>.<owner>.<object>. 
Postgres follows this model.  For example, one might have a table named "node" 
in a schema named "content" in a database instance called "test", i.e. 
test.content.node.  It is true that with Postgres, each database instance has 
its own connection server, and thus SQL queries cannot cross database 
instances.  However, they can easily span multiple schema.

MySQL, in contrast is "one-third lame" so to speak.  It only has what it 
*calls* a database (but what is really a schema) and an object.  One MySQL 
server on one port has multiple MySQL-named databases (schema) with objects 
(e.g. tables) within them.

So, no need to leave Postgres out of this discussion.

Let's not forget people who are hosting Drupal sites at hosts which do not 
allow multiple MySQL databases, or perhaps only provide a limited number and 
the user is up against that limit already.

Instead, I suggest the switch code between "content" and "configuration" 
tables be smart enough to select either separate Postgres schema, separate 
MySQL databases, OR different *table prefixes*.  The one could have the same 
advantages with all tables in one database.  The biggest drawback to this 
scheme is the existing problem with prefixes:  the install scripts (and update 
scripts?  or is that fixed?) are not able to handle prefixes in an easy and 
automated way.  It takes hand editing by someone with a clue about SQL.

This separation might provide other benefits, too.  It might make caching or 
other optimizations easier, for instance.

..chrisxj