[development] Database schema abstraction and *reflection* (was: Referential integrity -- finally?)

Jeremy Epstein jazepstein at gmail.com
Fri Jan 26 01:34:01 UTC 2007

This is a really great proposal. I agree that Drupal needs a DB schema
abstraction system, for all the reasons that you listed, Barry.

I'm not suggesting that we should do more than a DB schema abstraction
system. I realise that this is all that it's practical to try and get
into core. However, I am pointing out that there is the possibility to
do more. The entity definition system in the importexportapi module is
rich enough to support automated CRUD operations throughout Drupal,
and of course to support automated import/export of data to various
text formats (e.g. XML, CSV). However, it is also much heavier than
the proposed DB schema abstraction system, and obviously needs a lot
of thought and a lot of work before it could be put into core (if it
ever will be).


On 1/26/07, Barry Jaspan <barry at jaspan.org> wrote:
> I raised the issue of database schema abstraction last summer, and I
> know that other people raised it before me.  The arguments this time
> seem to be the same as the used to be (though apparently Dries is
> closer to being persuaded :-).
> I would like to elaborate on my primary reason for supporting a
> database schema abstraction layer (not for data manipulation (SELECT,
> etc.), just for CREATE TABLE, ALTER, etc): database schema reflection.
> Schema reflection will provide Drupal the ability to understand and
> manipulate its own data schema (in the flavor of the Forms
> API).  Every table's structure will be defined in a data structure
> (just like a form), indicating its columns and data types, keys,
> etc.  I envision something like this (I wrote a perl script to
> automatically generate this structure by parsing  4.7's
> database.4.0.mysql.  Other methods are possible, but it only has to
> be done once):
> function system_install() {
>    //
>    // Table structure for table 'access'
>    //
>    $tables['access'] =
>      array
>        ('#cols' => array
>         (array('aid', 'tinyint', NOT_NULL, '#disp_width' => 10),
>          array('mask', 'varchar(255)', NOT_NULL, '#default' => ''),
>          array('type', 'varchar(255)', NOT_NULL, '#default' => ''),
>          array('status', 'tinyint', NOT_NULL, '#disp_width' => 2,
> '#default' => '0')),
>         '#keys' => array
>         (array('', 'aid', 'PRIMARY')));
>    //
>    // other tables defined here
>    //
>    return $tables;
> }
> Every column has a name, type, and NULL/NOT NULL setting, so those
> array elements are positional, and any additional/optional attributes
> are set with #name elements (this is to save space and typing, though
> all elements could obviously all be keyed to mimic Forms
> API).  Similarly, all keys have a name, column list, and index type,
> with possible optional attributes.
> Every module, including system, can return an array like this.  When
> _install hooks are called, the caller merges the result and hands the
> array off to a DBMS engine driver that translates the schema into
> appropriate CREATE TABLE, ALTER, etc. statements.
> I believe this is much better than db_create_table(),
> db_alter_table(), etc. for the same reason that the Forms API is
> better than pre-4.7's form_button(), form_checkbox(), etc.  A
> data-driven approach will prove a better design than a procedural
> approach.  Nothing prevents db_create_table() etc. from existing, of
> course; in fact they probably will, but usually only be called by the
> DBMS driver.
> Any module author that does not want to play the game just continues
> to call db_query('CREATE TABLE') in hook_install(); it is not like we
> can prevent it.  Similarly, _update_n() functions can remain
> unchanged.  If they don't return a structured array, they won't be
> involved in the abstraction layer.
> This is not a least-common-denominator approach.  Nothing prevents
> DBMS-specific settings:
>    $tables['some_table'] =
>      array
>        ('#mssql_proprietary_setting' => 'value',
>         ...);
> Schema version updates, at least the normal cases (add, rename,
> remove columns) can be handled just by adding version info into the
> table.  For example:
>    $tables['some_table'] =
>      array
>        ('#version' => 2,
>         '#cols' => array
>         (// all the original columns would be here, and now ...
>          array('newcol', 'int', NOT_NULL, '#version_added' => 2),
> For version updates that require custom code, we can add
>    $tables['some_table'] =
>      array
>        ('#version' => 4,
>         '#version_custom_updates' => array(2, 6), ...);
> so when the schema updates from 1 to 2 or 5 to 6,
> module_update_some_table_2 or _6 is called.  (As I write this, I
> think I realize that the #version keys should be in $tables, not
> $tables['some_table'].)
> What does this buy us?
> - Several modules (e.g. Views) already need schema reflection, this
> just centralizes the information.  It makes it much easier to create
> new modules that need reflection.  A module that wants to support
> Views or other similar modules can hang any necessary data off its
> primary $tables array.  In fact, many modules will probably not have
> to do anything more to support such reflection-modules.
> - Code size reduction and elimination of duplicate code.  My
> automatically-generated schema for core (with all comments) is 110%
> the size of database.4.0.mysql, but it obsoletes database.4.1.mysql,
> database.psql, etc.  The same reduction would happen in 5.0's system_install().
> - A DBMS driver can use inspection statements (SHOW TABLES, SHOW
> COLUMNS) to actually validate the database layout against Drupal's
> internal schema, reporting errors.
> - It is a huge step towards enabling partial data migration across
> Drupal systems (this is a whole other topic).
> - hook_table_alter.
> - The Forms API made all kinds of things possible that weren't
> before, not all of them foreseen.  This will, too.
> Yes, this is somewhat more complicated than writing a single MySQL
> CREATE TABLE statement.  Supporting Views is extra effort, too, but
> has a big payoff for module developers; so will this.  I assert that
> it is easier to create a structure like this once than it is to write
> AND MAINTAIN separate statements for MySQL, Postgres, Oracle, Sybase,
> MS SQL Server, and god knows what else.  And module authors can keep
> using CREATE TABLE if they really want to.
> Anyway, that's my $.02.
> Thanks,
> Barry

More information about the development mailing list