Let's finally document our schema already!
Harry Slaughter, JonBob, myself, and others have tried countless times to come up with a workable solution for documenting our database schema. This either resulted in static images and so forth that were impossible to maintain, or trying to work around MySQL's comment length limitations. However, with the Schema API, the day is saved! Joy! Earl came up with this slick idea in the Views 2.0 schema to stick a 'description' attribute on each field in the schema file. Drupal will of course ignore this, having no knowledge of what a description is, but human beings (and probably eventually parsers) can read this information in order to determine what the various fields are for. Additionally, the documentation is *part of the source code*, which means we can deny schema additions until they have documentation associated, just as we deny new functions without PHPDoc. And finally, because the descriptions are stored in t() functions, that means the documentation is (potentially?) translatable to other languages. I've taken this idea a couple steps further (by adding table descriptions as well as field descriptions, and adding in foreign key information), and have started documenting all core schema files here: http://drupal.org/node/164983 I'd love help from: - Anyone at all who has some time on their hands and would like to learn Drupal's schema inside and out by helping to document the remainder of the files, using the existing patch as a guide. - Core committers, to give a thumbs up/thumbs down to this approach before we dump a bunch of time into it. ;) - Schema API folks, to give guidance, esp. on foreign key representation. - Experts on the lesser-used aspects of Drupal (aggregator module, for instance) who can clarify what some of the fields are for. - Testers, to make sure the patch doesn't horribly break things. ;) Thanks! -Angie
On 8/8/07, Angela Byron <drupal-devel@webchick.net> wrote:
Earl came up with this slick idea in the Views 2.0 schema to stick a 'description' attribute on each field in the schema file. Drupal will of course ignore this, having no knowledge of what a description is, but human beings (and probably eventually parsers) can read this information in order to determine what the various fields are for.
Schema information should be parsed by the API module at some point. However, PHP code is not friendly for this. There are three options: 1. Execute the code and extract the information. Would be really easy and really insecure. 2. Regular expressions. Only works if we strictly conform to a certain syntax. 3. PHP's tokenizer. Like regular expressions, but we have a list of tokens and what PHP thinks they are. A bit smarter, but still fails if someone mixes up the syntax. None of these options is especially good. Documentation comments would be much better. They are in code, but have the advantage of not being code. The challenge is figuring out a convenient and consistent comment syntax. -- Neil Drumm http://delocalizedham.com
On 9-Aug-07, at 1:42 AM, Neil Drumm wrote:
Documentation comments would be much better. They are in code, but have the advantage of not being code. The challenge is figuring out a convenient and consistent comment syntax.
As long as it's in-line with the schema source code, it doesn't matter much to me what format it takes. But it **MUST** be in the actual source code of the .schema file, and not stored in a separate document (like the FAPI reference) or it will very quickly fall hopelessly out of date with what the code actually says (like the FAPI reference :P~). In the meantime, though, we still do need the table/field descriptions before we can do either one, so we can still work away on that while we come to a consensus about the comment syntax. Past attempts: - http://drupal.org/node/28046: uses the MySQL COMMENT attribute, limited to 60 characters. We can no longer use this method, because we don't have actual SQL doing create table statements, and the character limit sucked, anyway. - http://drupal.org/node/79874: uses a common comment format ( -- (PK), -- (FK) -> session.sid, -- Description) to the SQL. Again, we're no longer storing SQL, so this isn't viable, and it also shares the fragile problem that regexing it out of the schema files has. I'm having kind of a hard time making sense of this document, but I "think" the "Putting documentation after members" section of the Doxygen manual [http://www.stack.nl/~dimitri/doxygen/docblocks.html] seems to have a special syntax for what kind of we're trying to do: int var; /**< Detailed description after the member */ Of course, fields and tables aren't members of objects, but they are "kinda sorta" the same thing. Meh, I dunno. Open to other suggestions... -Angie
On 8/8/07, Angela Byron <drupal-devel@webchick.net> wrote:
On 9-Aug-07, at 1:42 AM, Neil Drumm wrote:
Documentation comments would be much better. They are in code, but have the advantage of not being code. The challenge is figuring out a convenient and consistent comment syntax.
As long as it's in-line with the schema source code, it doesn't matter much to me what format it takes. But it **MUST** be in the actual source code of the .schema file, and not stored in a separate document (like the FAPI reference) or it will very quickly fall hopelessly out of date with what the code actually says (like the FAPI reference :P~).
In the code is where documentation comments belong. If we get a decent array documentation construct it might be useful elsewhere. Drupal tends to accumulate arrays. -- Neil Drumm http://delocalizedham.com
huh? the mysql database driver can choose to truncate and inject the description into its CREATE TABLE statements. we don't have to impose a max character restriction nor force any other driver to use this data. the description stays in schema definition as you wisely proposed.
3. PHP's tokenizer. Like regular expressions, but we have a list of tokens and what PHP thinks they are. A bit smarter, but still fails if someone mixes up the syntax.
potx uses the tokenizer to extract t()'ed strings from the source code, IIRC. I think the Tokenizer is the way to go for this documentation issue. We can even re-use parts of potx module for discovering the t() calls. Also, you argument with syntax is not really valid. With almost every imaginable syntax, it is possible to mess something up.
Documentation comments would be much better. They are in code, but have the advantage of not being code. The challenge is figuring out a convenient and consistent comment syntax.
If we want to display the documentation somewhere else than in the source code, we'd also have to extract them somehow (possibly using regular expressions), which is not really different from writing them as actual code in the schema file. And your argument about messing up the syntax is especially true for this, since PHP doesn't perform "syntax checking" for comments. If documentation is provided as code, we can at least be sure that the file parses correctly. Konstantin Käfer — http://kkaefer.com/
Regarding diagrams ... Frederic Marand (fgm) has done some of those done in the past, albeit for 4.7: http://blog.riff.org/2006_05_14_data_model_for_drupal_4_7_core One for taxonomy (4.6) http://blog.riff.org/2005_11_27_grokking_drupal_the_taxonomy_system He also did one for ecommerce: http://blog.riff.org/2006_10_07_grokking_drupal_ecommerce_dependencies And project: http://blog.riff.org/2006_07_28_grokking_drupal_the_project_module_in_drupal... As for a tool to generate diagrams (with some manual work), I had some success with Open Office 2.2 Database which works fine with MySQL (haven't tried Kexi).
Angela Byron wrote:
Earl came up with this slick idea in the Views 2.0 schema to stick a 'description' attribute on
This was not my idea; this is in the schema API docs. I'm not sure if it's used anywhere, but it seemed like a good idea to utilize it.
On 09 Aug 2007, at 07:00, Angela Byron wrote:
Earl came up with this slick idea in the Views 2.0 schema to stick a 'description' attribute on each field in the schema file. Drupal will of course ignore this, having no knowledge of what a description is, but human beings (and probably eventually parsers) can read this information in order to determine what the various fields are for. Additionally, the documentation is *part of the source code*, which means we can deny schema additions until they have documentation associated, just as we deny new functions without PHPDoc. And finally, because the descriptions are stored in t() functions, that means the documentation is (potentially?) translatable to other languages.
Is there a reason why we can't use PHPdoc for this? (I'm not really keen on having to execute random PHP code on api.drupal.org.) -- Dries Buytaert :: http://www.buytaert.net/
On 09 Aug 2007, at 1:58 PM, Dries Buytaert wrote:
Is there a reason why we can't use PHPdoc for this? (I'm not really keen on having to execute random PHP code on api.drupal.org.)
I'm not really sure that we'd need to execute it. We only document core on api, and schema.module makes the existing data structure self documenting (as far as I understand). This same information would be available to all contrib modules, for anybody who installs the schema module, similar to how you can get api documentation by installing the api module. It would even theoretically be possible to have the schema module produce an image (which can then be cached) or svg file (which would be trés cool) from the information it has (ie: foreign keys and the like).
adrian rossouw wrote:
It would even theoretically be possible to have the schema module produce an image (which can then be cached) or svg file (which would be trés cool) from the information it has (ie: foreign keys and the like).
The SVG API Google SoC project by Snufkin might help there :) http://groups.drupal.org/node/5343 Gabor
we could, but it isn't ideal. this will necessitate that every table and field gets defined twice - in php array and phpdoc. true that only the name is duplicated but still - the chance for out of sync increases. anyway, we don't need this on api.drupal.org. people who want a data dictionary just install schema module and they have a data dictionary exactly matching their current site. as barry said, the data dictionary presentation is already done. On 8/9/07, Dries Buytaert <dries.buytaert@gmail.com> wrote:
On 09 Aug 2007, at 07:00, Angela Byron wrote:
Earl came up with this slick idea in the Views 2.0 schema to stick a 'description' attribute on each field in the schema file. Drupal will of course ignore this, having no knowledge of what a description is, but human beings (and probably eventually parsers) can read this information in order to determine what the various fields are for. Additionally, the documentation is *part of the source code*, which means we can deny schema additions until they have documentation associated, just as we deny new functions without PHPDoc. And finally, because the descriptions are stored in t() functions, that means the documentation is (potentially?) translatable to other languages.
Is there a reason why we can't use PHPdoc for this? (I'm not really keen on having to execute random PHP code on api.drupal.org.)
-- Dries Buytaert :: http://www.buytaert.net/
Angie, Glad to see someone else jumping on this idea! See the issue http://drupal.org/node/160599 on this exact topic which I posted three weeks ago, and announced here. Clearly, your posts get more respect than mine do. :-) I'll mark my issue as a duplicate.
Earl came up with this slick idea in the Views 2.0 schema to stick a 'description' attribute on each field in the schema file.
That was me, and it is in the Schema API docs (see http://drupal.org/node/146939, but I'm glad to see Earl using it. :-) I am hoping to see Views 2.0 and Schema work closely together for much synergistic goodness.
I've taken this idea a couple steps further (by adding table descriptions as well as field descriptions, and adding in foreign key information), and have started documenting all core schema files here: http://drupal.org/node/164983
Again, see http://drupal.org/node/146939. Be sure to wrap table names in table and field descriptions in curly braces so that schema.module can display them as hyperlinks to the table's definition.
- Schema API folks, to give guidance, esp. on foreign key representation.
My guidance is not to include foreign key information as part of this issue as it will just complicate it and delay getting the description fields in. I worked on ref integrity a couple months ago, defined a way to represent foreign keys in the schema structure, and wrote the code in the pgsql driver to create the foreign keys. It was actually quite easy to do. However, it turns out that Drupal will need substantial and careful work before referential integrity can be added including changes all over core and contrib; for example, see my previous thread on this list about whether to use RESTRICT or CASCADE deletions and the impliciations for the rest of system. Since we cannot safely *use* foreign key info, I'm not sure it is a good idea to include it in the schema. We could include foreign key info "as documentation" but we'd want to be sure we got the representation right, and it is tricky. First, the foreign keys array must be keyed by table name, not field name, to reasonably support multi-column foreign keys. Here is the format I came up with: $schema['node'] = array( 'fields' => array( 'nid' => array('type' => 'int'), ... ), ); $schema['node_revisions'] = array( 'fields' => array( 'nid' => array('type' => 'int'), 'vid' => array('type' => 'int'), ... ), 'foreign keys => array( 'node' => array('nid' => 'nid'), ), )); $schema['content_type_foobar'] = array( 'fields' => array( 'nid' => array('type' => 'int'), 'vid' => array('type' => 'int'), ... ), 'foreign keys' => array( // redundant, just shows you can have more than one: 'node' => array('nid' => 'nid'), // could just be on vid, but shows you can have multiple columns: 'node_revisions' => array('nid' => 'nid', 'vid' => 'vid'), ), )); As shorthands, when the column name is the same in both tables you can just specify it as value, not as key => value, and when there is exactly one column in the foreign key and its name is the same in both tables, you can specify it as a string instead of an array. So, content_type_foobar could equivalently contain: 'foreign keys' => array( 'node' => 'nid', 'node_revisions => array('nid', 'vid'), ) HOWEVER, there is more to it than this. For example, we need to be able to specify JOINS in the schema that are not representable as foreign keys. For example, we have tables that contain an "object id" (int) and a "object type" (string, e.g. "node"). The join is "object id = node.nid WHERE object type = 'node'". That can't be a foreign key because it is conditional, but we should still represent the relationship in the schema so we can automatically load nodes including this table's information. I haven't thought about the best way to do that yet. Bottom line, don't hold up schema descriptions until we figure out how to represent foreign keys/joins. Thanks, Barry
On 9-Aug-07, at 9:50 AM, Barry Jaspan wrote:
Angie,
Glad to see someone else jumping on this idea! See the issue http://drupal.org/node/160599 on this exact topic which I posted three weeks ago, and announced here. Clearly, your posts get more respect than mine do. :-) I'll mark my issue as a duplicate.
Ack! :( I'm so sorry. Frando told me there was an issue for this already, but I searched with both d.o's issue search and google, and couldn't find it. :( I assumed maybe someone was going to post an issue and then forgot about it. Looks like my keyword skills could use some work. :(
I've taken this idea a couple steps further (by adding table descriptions as well as field descriptions, and adding in foreign key information), and have started documenting all core schema files here: http://drupal.org/node/164983
Again, see http://drupal.org/node/146939. Be sure to wrap table names in table and field descriptions in curly braces so that schema.module can display them as hyperlinks to the table's definition.
That's great! I'll go ahead and represent foreign key info this way then, and skip all the various complications you outline in the rest of your post. Thanks so much! -Angie
participants (10)
-
adrian rossouw -
Angela Byron -
Barry Jaspan -
Dries Buytaert -
Earl Miles -
Gabor Hojtsy -
Khalid Baheyeldin -
Konstantin Käfer -
Moshe Weitzman -
Neil Drumm