Separating configuration vs data tables
One thing I'm not aware of a good way to do with Drupal is sandbox testing or controlled change management on a running system. There are some posts talking about how have a test site and test database for development, but you always have the problem of re-integrating your changes back into the running system. You either have to (a) re-make the changes on the production system one step at a time or (b) transfer the development database to the production database, losing any new content/users/data. I do (a), but it is annoying and error prone. If someone has a better system, I'd love to hear it. My suggestion is to identify which database tables contain administrative or configuration data vs. which tables contain content and then store them in separate databases (brief research indicates that MySQL supports cross-database queries but Postgres does not, so this functionality would be turned off with Postgres). Ideally, it would then be possible to clone both databases, add modules or change settings in the sandbox until happy, and then copy back only the admin/config database and any filesystem mods. In one step, the production system would be upgraded. We would have to decide which tables fall into each category and possibly refactor some tables if they span both categories. Perhaps it is not possible to separate them all cleanly in this way, but it seems worth trying. As for implementation, we already have the $db_prefix array that can be used to convert {tablename} syntax into database-qualified values. That could be set up manually. Even better would be to have this separation work automatically for core and contributed modules by specifying which category each table belongs in a data driven way: function system_schema() { $tables['variable']['#category'] = DB_CAT_CONFIG; ...; } function node_schema() { $tables['node']['#category'] = DB_CAT_DATA; ...; } Comments? Barry
My suggestion is to identify which database tables contain administrative or configuration data vs. which tables contain content and then store them in separate databases (brief research indicates that MySQL supports cross-database queries but Postgres does not, so this functionality would be turned off with Postgres). Ideally, it would then be possible to clone both databases, add modules or change settings in the sandbox until happy, and then copy back only the admin/config database and any filesystem mods. In one step, the production system would be upgraded.
We would have to decide which tables fall into each category and possibly refactor some tables if they span both categories. Perhaps it is not possible to separate them all cleanly in this way, but it seems worth trying.
right and right. it would be outstanding if we could pull this off. Migrating changes to production is a major headache, and this would make it really easy to roll forward and even roll back. i used to work at a company which worked this way. we called relatively static tables 'domain' tables and relatively dynamic 'transactional'. i like your terms of configuration and content better i think. yes, that prefix trick is very cool and requires no changes from core or contrib. but i have a better idea. Lets store this mapping in a new configuration table. Then, modules or the installer can write into this table. we just need to run this pseudocode in settings.php where the $db_prefix stuff is now: // set prefixes for each table as specified by the prefixes table $sql = SELECT prefix, name FROM prefixes; $result = db_query($sql); while ($row = db_fetch_object($result) { $db_prefix[$row->name] = $row->prefix; } from there on, queries will always hit the right DB. as a next step, i sigest identifying which tables go where and resolving splitting up ambigous tables
On 6/9/06, Moshe Weitzman <weitzman@tejasa.com> wrote:
My suggestion is to identify which database tables contain administrative or configuration data vs. which tables contain content and then store them in separate databases (brief research indicates that MySQL supports cross-database queries but Postgres does not, so this functionality would be turned off with Postgres). Ideally, it would then be possible to clone both databases, add modules or change settings in the sandbox until happy, and then copy back only the admin/config database and any filesystem mods. In one step, the production system would be upgraded.
We would have to decide which tables fall into each category and possibly refactor some tables if they span both categories. Perhaps it is not possible to separate them all cleanly in this way, but it seems worth trying.
right and right. it would be outstanding if we could pull this off. Migrating changes to production is a major headache, and this would make it really easy to roll forward and even roll back.
i used to work at a company which worked this way. we called relatively static tables 'domain' tables and relatively dynamic 'transactional'. i like your terms of configuration and content better i think.
yes, that prefix trick is very cool and requires no changes from core or contrib. but i have a better idea. Lets store this mapping in a new configuration table. Then, modules or the installer can write into this table. we just need to run this pseudocode in settings.php where the $db_prefix stuff is now:
// set prefixes for each table as specified by the prefixes table $sql = SELECT prefix, name FROM prefixes; $result = db_query($sql); while ($row = db_fetch_object($result) { $db_prefix[$row->name] = $row->prefix; }
from there on, queries will always hit the right DB.
as a next step, i sigest identifying which tables go where and resolving splitting up ambigous tables
This is something I was thinking about the other day, but from a different angle. Some CMS have built in support for "production", "test" and "development" environments. I think RoR and Plone have this. Then I thought that multi site is so easy in Drupal that one can use the same codebase with 3 different databases (or 3 different prefixes in the same db). Now what Barry is proposing "configuration" (or "system") vs. "content" is a really good differentiator here, and could be used in many situations, such as sharing tables between sites, upgrades, ...etc.
Moshe Weitzman wrote:
My suggestion is to identify which database tables contain administrative or configuration data vs. which tables contain content and then store them in separate databases (brief research indicates that MySQL supports cross-database queries but Postgres does not, so this functionality would be turned off with Postgres). Ideally, it would then be possible to clone both databases, add modules or change settings in the sandbox until happy, and then copy back only the admin/config database and any filesystem mods. In one step, the production system would be upgraded.
We would have to decide which tables fall into each category and possibly refactor some tables if they span both categories. Perhaps it is not possible to separate them all cleanly in this way, but it seems worth trying.
right and right. it would be outstanding if we could pull this off. Migrating changes to production is a major headache, and this would make it really easy to roll forward and even roll back.
yes, that prefix trick is very cool and requires no changes from core or contrib. but i have a better idea. Lets store this mapping in a new configuration table. Then, modules or the installer can write into this table. we just need to run this pseudocode in settings.php where the $db_prefix stuff is now:
// set prefixes for each table as specified by the prefixes table $sql = SELECT prefix, name FROM prefixes; $result = db_query($sql); while ($row = db_fetch_object($result) { $db_prefix[$row->name] = $row->prefix; }
from there on, queries will always hit the right DB.
as a next step, i sigest identifying which tables go where and resolving splitting up ambigous tables
Good idea! Below, some thoughts on making it better. First, a bit of clarification on Postgres vs. MySQL. Postgres has more capabilities than MySQL, and follows the SQL standard more closely. What MySQL lacks is the 3rd tier of the object hierarchy in standard SQL. Standard SQL specifies a database object by a triplet usually named <database>.<schema>.<object> or sometimes <database>.<owner>.<object>. Postgres follows this model. For example, one might have a table named "node" in a schema named "content" in a database instance called "test", i.e. test.content.node. It is true that with Postgres, each database instance has its own connection server, and thus SQL queries cannot cross database instances. However, they can easily span multiple schema. MySQL, in contrast is "one-third lame" so to speak. It only has what it *calls* a database (but what is really a schema) and an object. One MySQL server on one port has multiple MySQL-named databases (schema) with objects (e.g. tables) within them. So, no need to leave Postgres out of this discussion. Let's not forget people who are hosting Drupal sites at hosts which do not allow multiple MySQL databases, or perhaps only provide a limited number and the user is up against that limit already. Instead, I suggest the switch code between "content" and "configuration" tables be smart enough to select either separate Postgres schema, separate MySQL databases, OR different *table prefixes*. The one could have the same advantages with all tables in one database. The biggest drawback to this scheme is the existing problem with prefixes: the install scripts (and update scripts? or is that fixed?) are not able to handle prefixes in an easy and automated way. It takes hand editing by someone with a clue about SQL. This separation might provide other benefits, too. It might make caching or other optimizations easier, for instance. ..chrisxj
On 10 Jun 2006, at 4:27 AM, Chris Johnson wrote:
Instead, I suggest the switch code between "content" and "configuration" tables be smart enough to select either separate Postgres schema, separate MySQL databases, OR different *table prefixes*. The one could have the same advantages with all tables in one database. The biggest drawback to this scheme is the existing problem with prefixes: the install scripts (and update scripts? or is that fixed?) are not able to handle prefixes in an easy and automated way. It takes hand editing by someone with a clue about SQL.
not with the install patch that is being proposed for core. it wraps the drupal schema in a system.install, and uses db_query with the {} to create the tables. Also. all contrib modules should be prefix friendly. -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com
I'd like to see a solution to this problem as well. But there are a few tables that don't fall into just configuration and content - there's also log tables (the drupal one, as well as some module specific ones), and perhaps user tables, and maybe comments. For some kinds of sites where content is all centrally controlled, you may want to put up some new content and test it in a staging environment before pushing it up to your production machine, but you wouldn't want to overwrite those log, comment or user tables. Or you might want to merge up your new content, if some of your content is controlled, and other parts is contributed from the users. In addition, when you start thinking about flexinode/cck and all the different ways that it's used (sometimes table content is considered content, sometimes config) or taxonomy (fixed categories vs. freetagging), the division between content and configuration becomes very murky. As an interesting example, I just upgraded my Ubuntu to dapper (an aside - it was amazing!). Similarly to freebsd as well, if you've customized your config files, it'll ask you if you want to keep your existing ones, or replace them with the new ones. So I wonder if something like that would be relatively straightforward and useful, even if there will always be problems in specific cases. Conclusion: rather than trying to categorize in detail all the different tables, I'd propose allowing them to be in just three different categories: a. transactional (a good word after all ...) b. content c. configuration where the default is 'content'. This imaginary script would then: 1. ignore all the log tables 2. interactively allow each of the content tables to be ignored, merged or replaced, after an analysis of the differences. 3. replace all the configuration tables (and save the difference somewhwere as a sql script that could be called to selectively reverse some changes? ah, now it's getting ambitious). Some problems would occur if you're merging node tables - you'd need to update all the related id's in the modules for example. On 6/9/06, Moshe Weitzman <weitzman@tejasa.com> wrote:
My suggestion is to identify which database tables contain administrative or configuration data vs. which tables contain content and then store them in separate databases (brief research indicates that MySQL supports cross-database queries but Postgres does not, so this functionality would be turned off with Postgres). Ideally, it would then be possible to clone both databases, add modules or change settings in the sandbox until happy, and then copy back only the admin/config database and any filesystem mods. In one step, the production system would be upgraded.
We would have to decide which tables fall into each category and possibly refactor some tables if they span both categories. Perhaps it is not possible to separate them all cleanly in this way, but it seems worth trying.
right and right. it would be outstanding if we could pull this off. Migrating changes to production is a major headache, and this would make it really easy to roll forward and even roll back.
i used to work at a company which worked this way. we called relatively static tables 'domain' tables and relatively dynamic 'transactional'. i like your terms of configuration and content better i think.
yes, that prefix trick is very cool and requires no changes from core or contrib. but i have a better idea. Lets store this mapping in a new configuration table. Then, modules or the installer can write into this table. we just need to run this pseudocode in settings.php where the $db_prefix stuff is now:
// set prefixes for each table as specified by the prefixes table $sql = SELECT prefix, name FROM prefixes; $result = db_query($sql); while ($row = db_fetch_object($result) { $db_prefix[$row->name] = $row->prefix; }
from there on, queries will always hit the right DB.
as a next step, i sigest identifying which tables go where and resolving splitting up ambigous tables
-- Alan Dixon, Web Developer http://alan.g.dixon.googlepages.com/
On 15-Jun-06, at 10:21 AM, Alan Dixon wrote:
Conclusion: rather than trying to categorize in detail all the different tables, I'd propose allowing them to be in just three different categories: a. transactional (a good word after all ...) b. content c. configuration
where the default is 'content'. This imaginary script would then:
1. ignore all the log tables 2. interactively allow each of the content tables to be ignored, merged or replaced, after an analysis of the differences. 3. replace all the configuration tables (and save the difference somewhwere as a sql script that could be called to selectively reverse some changes? ah, now it's getting ambitious).
Some problems would occur if you're merging node tables - you'd need to update all the related id's in the modules for example.
Alan -- thanks for this write up. I think the staging/production server issue is one that merits attention. A lot of times, it means we're running without a safety net -- it is too hard to merge (config, content) changes between a dev site and a live site...so lots of people just don't, or do extra work in re-doing the same config steps on a live site. Not sure how to go about this...a few folks need to bang heads about the best way to get this done. -- Boris Mann Vancouver 778-896-2747 San Francisco 415-367-3595 SKYPE borismann http://www.bryght.com
This is what I call "fixtures". Yes, still vapour, but it wil be there, as part of sympal scripts. Op vrijdag 9 juni 2006 15:37, schreef Barry Jaspan:
I do (a), but it is annoying and error prone. If someone has a better system, I'd love to hear it.
I try to extract, one by one all content that is considered part of a "configuration". This can potentially be users, and nodes too. As well as categories etc. These are then dumped into PHP files, as PHP. nodes.fixture $node->title = 'My Cool Frontpage'; $node->body = 'This is printed on my frontpage node'; $node->name = 'Site Administrator'; $nodes[] = $node; .... users.fixture $user->name = 'Site Administrator'; etc note that we do not have stufff like Dates, uids and even nids. This is what makes fixtures far better then SQL dumps. A SQL dump is so much intertwined that it is near impossible to select only one or two tables and not all the others. Nodes need users; users need roles. Nodes need terms; terms need a whole range of tables. Nodes need revisions. Users need counters. Nodes need counters; etc etc etc. I guess you can go on untill you have.... A full SQL dump ;) Now, still vapourware, in that matter, that I really want to find a central way to halde all tis, but Drupal is very incosistent. So the result is that I need to develop a wrapper around all 'objects' (users, terms, nodes etc) S that my fixtures are consistent and unified, while the backend (Drupal) can remain inconsistent. Bèr -- PGP ber@webschuur.com http://www.webschuur.com/sites/webschuur.com/files/ber_webschuur.asc Drupal upgrade repareert kritiek beveiligingslek: http://help.sympal.nl/drupal_upgrade_repareert_kritiek_beveiligingslek
Now, still vapourware, in that matter, that I really want to find a central way to halde all tis, but Drupal is very incosistent. So the result is that I need to develop a wrapper around all 'objects' (users, terms, nodes etc) S that my fixtures are consistent and unified, while the backend (Drupal) can remain inconsistent.
What is inconsistent? And why can't it be made consistent instead? -- Dries Buytaert :: http://www.buytaert.net/
Op zaterdag 10 juni 2006 00:18, schreef Dries Buytaert:
What is inconsistent? Example: node_load($uid) vs user_load(array('uid' => $uid)) :: parameters of various load/save/update are very incosistent. Some eat objects, other only certain arrays, again others dont eat anything, but grab it from the form api.
taxonomy_get_term() vs foo_load :: naming of apis. And a lot more small, details.
And why can't it be made consistent instead?
It can. It will. Given time. Categories module is one of the ways to get this. But its not a real problem. It just needs work and code, not even a lot of talk and reviews, I think. Though Drupal itself does not need this, It works fine as it it, IMO, which is why no-one has not sat down for this, I think. Its just the odd project like these fixtures that require consistent apis. (Well, my brain would not mind either :) ) Bèr -- | Bèr Kessels | webschuur.com | website development | | Jabber & Google Talk: ber@jabber.webschuur.com | http://bler.webschuur.com | http://www.webschuur.com | Drupal repareert wederom een kritiek veiligheidslek: http://help.sympal.nl/drupal_repareert_wederom_een_kritiek_veiligheidslek
On Jun 9, 2006, at 6:37 AM, Barry Jaspan wrote:
My suggestion is to identify which database tables contain administrative or configuration data vs.
CivicSpace has spent about 60 hours this week building the tools for extracting the configuration of a site into .profile and into a .install. Here's what we are extracting and then making available as a default configuration in a .profile: 1) Modules from the System Table are enabled in the .profile function gojoingo_profile_modules() { return array('system', 'block', 'civicrm', 'filter', 'help', 'menu', 'node', 'page', 'taxonomy', 'user', 'watchdog', 'blog'); } The rest are set using a .install 2) Menu's from the Menu Table: use the devel module to reset your menu's and diff them DB dumps 3) Menu's for primary navigation: We actually point these to views : My Groups, My Events... 3) Blocks from the Blocks Table 4) Access control from the access table 5) Module configuration settings: we have written a script called variable-export that export's your variables into a list of variable_set('variable', value); variable_set('comment_content-event', '2'); variable_set('event_nodeapi_content-event', 'all'); variable_set('image_attach_content-event', '0'); variable_set('upload_content-event', '1'); variable_set('location_weight_content-event', '9'); variable_set('location_name_content-event', '1'); so that we can just add to the .install 6) Localization exporting to make Drupal terminology user friendly - we can do it manually but having some problems populating using the .install 7) Views are proving tricky to add in the .install with views_import_view_submit. Look for help here. We aren't that far away from having a module that just dump's your entire site configuration into a .profile and a .install. Then you can just add the files to the Drupal install profile system and presto you have a Drupal distribution or you have a back up of your site. If you want to collaborate and can contribute some hours that would be very helpful. Kieran CivicSpace
participants (10)
-
Adrian Rossouw -
Alan Dixon -
Barry Jaspan -
Boris Mann -
Bèr Kessels -
Chris Johnson -
Dries Buytaert -
Khalid B -
Kieran Lal -
Moshe Weitzman