[development] Database schema abstraction and reflection (was: Referential integrity -- finally?)

Fri Jan 26 00:03:12 UTC 2007

I raised the issue of database schema abstraction last summer, and I 
know that other people raised it before me.  The arguments this time 
seem to be the same as the used to be (though apparently Dries is 
closer to being persuaded :-).

I would like to elaborate on my primary reason for supporting a 
database schema abstraction layer (not for data manipulation (SELECT, 
etc.), just for CREATE TABLE, ALTER, etc): database schema reflection.

Schema reflection will provide Drupal the ability to understand and 
manipulate its own data schema (in the flavor of the Forms 
API).  Every table's structure will be defined in a data structure 
(just like a form), indicating its columns and data types, keys, 
etc.  I envision something like this (I wrote a perl script to 
automatically generate this structure by parsing  4.7's 
database.4.0.mysql.  Other methods are possible, but it only has to 
be done once):

function system_install() {
   //
   // Table structure for table 'access'
   //
   $tables['access'] =
     array
       ('#cols' => array
        (array('aid', 'tinyint', NOT_NULL, '#disp_width' => 10),
         array('mask', 'varchar(255)', NOT_NULL, '#default' => ''),
         array('type', 'varchar(255)', NOT_NULL, '#default' => ''),
         array('status', 'tinyint', NOT_NULL, '#disp_width' => 2, 
'#default' => '0')),
        '#keys' => array
        (array('', 'aid', 'PRIMARY')));

   //
   // other tables defined here
   //

   return $tables;
}

Every column has a name, type, and NULL/NOT NULL setting, so those 
array elements are positional, and any additional/optional attributes 
are set with #name elements (this is to save space and typing, though 
all elements could obviously all be keyed to mimic Forms 
API).  Similarly, all keys have a name, column list, and index type, 
with possible optional attributes.

Every module, including system, can return an array like this.  When 
_install hooks are called, the caller merges the result and hands the 
array off to a DBMS engine driver that translates the schema into 
appropriate CREATE TABLE, ALTER, etc. statements.

I believe this is much better than db_create_table(), 
db_alter_table(), etc. for the same reason that the Forms API is 
better than pre-4.7's form_button(), form_checkbox(), etc.  A 
data-driven approach will prove a better design than a procedural 
approach.  Nothing prevents db_create_table() etc. from existing, of 
course; in fact they probably will, but usually only be called by the 
DBMS driver.

Any module author that does not want to play the game just continues 
to call db_query('CREATE TABLE') in hook_install(); it is not like we 
can prevent it.  Similarly, _update_n() functions can remain 
unchanged.  If they don't return a structured array, they won't be 
involved in the abstraction layer.

This is not a least-common-denominator approach.  Nothing prevents 
DBMS-specific settings:

   $tables['some_table'] =
     array
       ('#mssql_proprietary_setting' => 'value',
        ...);

Schema version updates, at least the normal cases (add, rename, 
remove columns) can be handled just by adding version info into the 
table.  For example:

   $tables['some_table'] =
     array
       ('#version' => 2,
        '#cols' => array
        (// all the original columns would be here, and now ...
         array('newcol', 'int', NOT_NULL, '#version_added' => 2),

For version updates that require custom code, we can add

   $tables['some_table'] =
     array
       ('#version' => 4,
        '#version_custom_updates' => array(2, 6), ...);

so when the schema updates from 1 to 2 or 5 to 6, 
module_update_some_table_2 or _6 is called.  (As I write this, I 
think I realize that the #version keys should be in $tables, not 
$tables['some_table'].)

What does this buy us?

- Several modules (e.g. Views) already need schema reflection, this 
just centralizes the information.  It makes it much easier to create 
new modules that need reflection.  A module that wants to support 
Views or other similar modules can hang any necessary data off its 
primary $tables array.  In fact, many modules will probably not have 
to do anything more to support such reflection-modules.

- Code size reduction and elimination of duplicate code.  My 
automatically-generated schema for core (with all comments) is 110% 
the size of database.4.0.mysql, but it obsoletes database.4.1.mysql, 
database.psql, etc.  The same reduction would happen in 5.0's system_install().

- A DBMS driver can use inspection statements (SHOW TABLES, SHOW 
COLUMNS) to actually validate the database layout against Drupal's 
internal schema, reporting errors.

- It is a huge step towards enabling partial data migration across 
Drupal systems (this is a whole other topic).

- hook_table_alter.

- The Forms API made all kinds of things possible that weren't 
before, not all of them foreseen.  This will, too.

Yes, this is somewhat more complicated than writing a single MySQL 
CREATE TABLE statement.  Supporting Views is extra effort, too, but 
has a big payoff for module developers; so will this.  I assert that 
it is easier to create a structure like this once than it is to write 
AND MAINTAIN separate statements for MySQL, Postgres, Oracle, Sybase, 
MS SQL Server, and god knows what else.  And module authors can keep 
using CREATE TABLE if they really want to.

Anyway, that's my $.02.

Thanks,

Barry

[development] Database schema abstraction and *reflection* (was: Referential integrity -- finally?)

[development] Database schema abstraction and reflection (was: Referential integrity -- finally?)