[development] Database schema abstraction and *reflection* (was: Referential integrity -- finally?)
Barry Jaspan
barry at jaspan.org
Fri Jan 26 00:03:12 UTC 2007
I raised the issue of database schema abstraction last summer, and I
know that other people raised it before me. The arguments this time
seem to be the same as the used to be (though apparently Dries is
closer to being persuaded :-).
I would like to elaborate on my primary reason for supporting a
database schema abstraction layer (not for data manipulation (SELECT,
etc.), just for CREATE TABLE, ALTER, etc): database schema reflection.
Schema reflection will provide Drupal the ability to understand and
manipulate its own data schema (in the flavor of the Forms
API). Every table's structure will be defined in a data structure
(just like a form), indicating its columns and data types, keys,
etc. I envision something like this (I wrote a perl script to
automatically generate this structure by parsing 4.7's
database.4.0.mysql. Other methods are possible, but it only has to
be done once):
function system_install() {
//
// Table structure for table 'access'
//
$tables['access'] =
array
('#cols' => array
(array('aid', 'tinyint', NOT_NULL, '#disp_width' => 10),
array('mask', 'varchar(255)', NOT_NULL, '#default' => ''),
array('type', 'varchar(255)', NOT_NULL, '#default' => ''),
array('status', 'tinyint', NOT_NULL, '#disp_width' => 2,
'#default' => '0')),
'#keys' => array
(array('', 'aid', 'PRIMARY')));
//
// other tables defined here
//
return $tables;
}
Every column has a name, type, and NULL/NOT NULL setting, so those
array elements are positional, and any additional/optional attributes
are set with #name elements (this is to save space and typing, though
all elements could obviously all be keyed to mimic Forms
API). Similarly, all keys have a name, column list, and index type,
with possible optional attributes.
Every module, including system, can return an array like this. When
_install hooks are called, the caller merges the result and hands the
array off to a DBMS engine driver that translates the schema into
appropriate CREATE TABLE, ALTER, etc. statements.
I believe this is much better than db_create_table(),
db_alter_table(), etc. for the same reason that the Forms API is
better than pre-4.7's form_button(), form_checkbox(), etc. A
data-driven approach will prove a better design than a procedural
approach. Nothing prevents db_create_table() etc. from existing, of
course; in fact they probably will, but usually only be called by the
DBMS driver.
Any module author that does not want to play the game just continues
to call db_query('CREATE TABLE') in hook_install(); it is not like we
can prevent it. Similarly, _update_n() functions can remain
unchanged. If they don't return a structured array, they won't be
involved in the abstraction layer.
This is not a least-common-denominator approach. Nothing prevents
DBMS-specific settings:
$tables['some_table'] =
array
('#mssql_proprietary_setting' => 'value',
...);
Schema version updates, at least the normal cases (add, rename,
remove columns) can be handled just by adding version info into the
table. For example:
$tables['some_table'] =
array
('#version' => 2,
'#cols' => array
(// all the original columns would be here, and now ...
array('newcol', 'int', NOT_NULL, '#version_added' => 2),
For version updates that require custom code, we can add
$tables['some_table'] =
array
('#version' => 4,
'#version_custom_updates' => array(2, 6), ...);
so when the schema updates from 1 to 2 or 5 to 6,
module_update_some_table_2 or _6 is called. (As I write this, I
think I realize that the #version keys should be in $tables, not
$tables['some_table'].)
What does this buy us?
- Several modules (e.g. Views) already need schema reflection, this
just centralizes the information. It makes it much easier to create
new modules that need reflection. A module that wants to support
Views or other similar modules can hang any necessary data off its
primary $tables array. In fact, many modules will probably not have
to do anything more to support such reflection-modules.
- Code size reduction and elimination of duplicate code. My
automatically-generated schema for core (with all comments) is 110%
the size of database.4.0.mysql, but it obsoletes database.4.1.mysql,
database.psql, etc. The same reduction would happen in 5.0's system_install().
- A DBMS driver can use inspection statements (SHOW TABLES, SHOW
COLUMNS) to actually validate the database layout against Drupal's
internal schema, reporting errors.
- It is a huge step towards enabling partial data migration across
Drupal systems (this is a whole other topic).
- hook_table_alter.
- The Forms API made all kinds of things possible that weren't
before, not all of them foreseen. This will, too.
Yes, this is somewhat more complicated than writing a single MySQL
CREATE TABLE statement. Supporting Views is extra effort, too, but
has a big payoff for module developers; so will this. I assert that
it is easier to create a structure like this once than it is to write
AND MAINTAIN separate statements for MySQL, Postgres, Oracle, Sybase,
MS SQL Server, and god knows what else. And module authors can keep
using CREATE TABLE if they really want to.
Anyway, that's my $.02.
Thanks,
Barry
More information about the development
mailing list