[from support] Staging
I'd like to draw the dev list members' attention to a thread from support about staging: http://lists.drupal.org/pipermail/support/2008-August/009492.html The reason I am sending this here, is that I believe there is no structured and well defined solution to the problem. There are many trials to handle this (e.g. even/odd nid numbering scheme, dbscripts), but most of them are cumbersome and/or limited in many ways. I believe this is a very common problem, and that the only real solution should come from core. I think the most difficult part here is to separate content from configuration. The first step would be how to define what is content and what is config (for example - site name - is it a content property or a config property?). The next step then would be to find a way to transfer config from a staging/dev site onto production, without overwriting any content that might have been added there during development. -- Yuval Hager [T] +972-77-341-4155 [@] yuval@avramzon.net
I am also plagued with the same issues. I think it would be extremely beneficial to separate what makes a site go from content. They data is so mashed up it is hard to know what ports and what does not. gina www.wupal.com On Sat, Aug 9, 2008 at 12:57 AM, Yuval Hager <yuval@avramzon.net> wrote:
I'd like to draw the dev list members' attention to a thread from support about staging: http://lists.drupal.org/pipermail/support/2008-August/009492.html
The reason I am sending this here, is that I believe there is no structured and well defined solution to the problem. There are many trials to handle this (e.g. even/odd nid numbering scheme, dbscripts), but most of them are cumbersome and/or limited in many ways.
I believe this is a very common problem, and that the only real solution should come from core. I think the most difficult part here is to separate content from configuration. The first step would be how to define what is content and what is config (for example - site name - is it a content property or a config property?). The next step then would be to find a way to transfer config from a staging/dev site onto production, without overwriting any content that might have been added there during development.
-- Yuval Hager [T] +972-77-341-4155 [@] yuval@avramzon.net
On 09 Aug 2008, at 2:31 PM, Gina Beisel wrote:
I am also plagued with the same issues. I think it would be extremely beneficial to separate what makes a site go from content. They data is so mashed up it is hard to know what ports and what does not. gina
Actually, I think it would be beneficial for us to discuss this problem on the list, even if it doesn't make it into core per se. We need to consider this kind of flexibility in our API's. so that we can make it simpler (or even possible) to do this without having to patch Drupal, or manipulate data dumps with external tools (that don't have access to such things as the _schema hook and any other semantics we might have, or can put in place to help solve this problem). Perhaps we could then build a module or something for Drush which could handle the dumping and merging of these partial data sets in a clean manner. It could use the schema information to 'understand' what it's importing / exporting. Anyone have any thoughts on the matter (as I know we have ALL been bitten by it before).
Adrian Rossouw wrote:
Actually, I think it would be beneficial for us to discuss this problem on the list, even if it doesn't make it into core per se.
We need to consider this kind of flexibility in our API's. so that we can make it simpler (or even possible) to do this without having to patch Drupal, or manipulate data dumps with external tools (that don't have access to such things as the _schema hook and any other semantics we might have, or can put in place to help solve this problem).
Perhaps we could then build a module or something for Drush which could handle the dumping and merging of these partial data sets in a clean manner. It could use the schema information to 'understand' what it's importing / exporting.
Anyone have any thoughts on the matter (as I know we have ALL been bitten by it before).
The problem is not only in dev->staging->production sense, but also in the other direction. When you change code, or config, you want to be able to push the changes into production. But on the same time, you also want to test your changes on the staging/dev server, with the latest content (for example, when changing the structure/theming of a content type). As yhager proposed, having two DBs instead of one, may ease things a lot. One DB will hold content: nodes, revisions, files, users... The second DB will hold configuration: views, cck structure, modules, variables... (this list is highly debatable). This will allow for two-ways updating - content is pulled down, while configuration can be easily pushed up. If there is enough interest, maybe we can still sneak in a BoF session in Szeged, although I believe we'll need much more than one meeting to solve this issue ;)
On 09 Aug 2008, at 4:09 PM, Zohar Stolar wrote:
As yhager proposed, having two DBs instead of one, may ease things a lot. One DB will hold content: nodes, revisions, files, users... The second DB will hold configuration: views, cck structure, modules, variables... (this list is highly debatable).
Prefixing would work just as well for this. I was thinking of adding properties to the table definition in schema, myself. IE: content / configuration.
On Saturday 09 August 2008, Adrian Rossouw wrote:
On 09 Aug 2008, at 4:09 PM, Zohar Stolar wrote:
As yhager proposed, having two DBs instead of one, may ease things a lot. One DB will hold content: nodes, revisions, files, users... The second DB will hold configuration: views, cck structure, modules, variables... (this list is highly debatable).
Prefixing would work just as well for this.
I was thinking of adding properties to the table definition in schema, myself. IE: content / configuration.
I wish it was that simple. I'm afraid this is not to be solved on the database level alone. Some rows in the variable table might count for content, others for configuration. CCK, aka the-other-part-of-core, would be a impossible to dissect based on post-mortem DB analysis. The menu system can be argued about, etc. -- Yuval Hager [@] yuval@avramzon.net
Yuval Hager wrote:
On Saturday 09 August 2008, Adrian Rossouw wrote:
On 09 Aug 2008, at 4:09 PM, Zohar Stolar wrote:
As yhager proposed, having two DBs instead of one, may ease things a lot. One DB will hold content: nodes, revisions, files, users... The second DB will hold configuration: views, cck structure, modules, variables... (this list is highly debatable).
Prefixing would work just as well for this.
I was thinking of adding properties to the table definition in schema, myself. IE: content / configuration.
I wish it was that simple. I'm afraid this is not to be solved on the database level alone. Some rows in the variable table might count for content, others for configuration. CCK, aka the-other-part-of-core, would be a impossible to dissect based on post-mortem DB analysis. The menu system can be argued about, etc.
CCK is configuration of the content, not the content itself. Probably a separation is possible in CCK's tables. Menus are content, but the position of their blocks is configuration. Views are configuration, their results are obviously content. A rule of the thumb could be: "If I change X, will it change any content?". If the answer is 'yes', then X is content. Otherwise it's configuration.
On 09 Aug 2008, at 6:46 PM, Zohar Stolar wrote:
CCK is configuration of the content, not the content itself. Probably a separation is possible in CCK's tables. Menus are content, but the position of their blocks is configuration. Views are configuration, their results are obviously content.
Custom menu items, modified custom menus etc are configuration, not content.
On 09 Aug 2008, at 6:31 PM, Yuval Hager wrote:
I wish it was that simple. I'm afraid this is not to be solved on the database level alone. Some rows in the variable table might count for content, others for configuration. CCK, aka the-other-part-of-core, would be a impossible to dissect based on post-mortem DB analysis. The menu system can be argued about, etc.
Yes, i know. which is why we are having this discussion. I don't think this can be solved from outside of Drupal. We need the information in the code, the example i gave was just a first step that would replicate what we can do at the moment. You also have nodes / users which might be part of configuration.
There was a BoF about this at the last DrupalCon. Doesn't hurt to keep talking about it, IMHO. Someone should propose it for Szeged if it hasn't already. Here's the one from Boston: http://boston2008.drupalcon.org/session/updating-and-upgrading-live-sites Some worthwhile links in the proposal as well as the comments. Including this video of some of the discussion: http://www.mikiane.com/node/2008/03/04/live-blogging-drupalcon-boston-2008 -Dave On Saturday 09 August 2008, Zohar Stolar wrote:
If there is enough interest, maybe we can still sneak in a BoF session in Szeged, although I believe we'll need much more than one meeting to solve this issue ;)
Personally, I don't see how two DBs improves things. In my experience, nodes are often "configuration" as well as "content". Trying to draw that line somewhere is a mistake, IMHO. You might draw the line where it makes sense for your sites, but not someone elses. As you point out, the list is highly debatable. I think it's undecidable. -Dave On Saturday 09 August 2008, Zohar Stolar wrote:
As yhager proposed, having two DBs instead of one, may ease things a lot. One DB will hold content: nodes, revisions, files, users... The second DB will hold configuration: views, cck structure, modules, variables... (this list is highly debatable).
This will allow for two-ways updating - content is pulled down, while configuration can be easily pushed up.
I think a really interesting project is the deploy module (http://drupal.org/project/deploy ) which I think is an interesting abstract approach to this issue. It won't meet everybody's needs, and isn't functionally sufficient yet (for content at least), but is a good start, and worth starting a more serious conversation of how this problem might be addressed from the larger Drupal community. Some of the concepts that Greg has already got in place are critical for the staging issue. You can check out a talk Greg did on the deploy module at the Seattle Drupal User Group here: http://blip.tv/file/1033300 Anyway, food for thought. a. On Aug 9, 2008, at 2:35 PM, Dave Cohen wrote:
Personally, I don't see how two DBs improves things. In my experience, nodes are often "configuration" as well as "content". Trying to draw that line somewhere is a mistake, IMHO. You might draw the line where it makes sense for your sites, but not someone elses.
As you point out, the list is highly debatable. I think it's undecidable.
-Dave
On Saturday 09 August 2008, Zohar Stolar wrote:
As yhager proposed, having two DBs instead of one, may ease things a lot. One DB will hold content: nodes, revisions, files, users... The second DB will hold configuration: views, cck structure, modules, variables... (this list is highly debatable).
This will allow for two-ways updating - content is pulled down, while configuration can be easily pushed up.
--------------------------------------------------- arthur@civicactions.com
On 09 Aug 2008, at 9:53 PM, arthur wrote:
I think a really interesting project is the deploy module (http://drupal.org/project/deploy ) which I think is an interesting abstract approach to this issue. It won't meet everybody's needs, and isn't functionally sufficient yet (for content at least), but is a good start, and worth starting a more serious conversation of how this problem might be addressed from the larger Drupal community. Some of the concepts that Greg has already got in place are critical for the staging issue. You can check out a talk Greg did on the deploy module at the Seattle Drupal User Group here: http://blip.tv/file/1033300
Deploy uses macros, essentially. It captures form submissions and resubmits them to the other sites. The entire drupal_execute / macro functionality was one of the tertiary goals of the initial design of fapi, unfortunately we didn't make the design water tight, so it was possible (easy in fact) to write forms which didn't work with macros. We would have to vet all the existing forms, and tighten up the API to force it to work consistently across all forms, to make deploy work in all cases. A good example of this not working is that cck 1.x forms. cck_import broke most of the time because the forms weren't built to work consistently with drupal_execute. Additionally, deploy only works one way. Which doesn't help in cases like this (where content/configuration needs to be moved both directions). The moment you modify anything on the live site, you run into the problem where it's impossible to merge the changes back, without wiping the staging / dev database, and resubmitting everything.
On Aug 9, 2008, at 4:10 PM, Adrian Rossouw wrote:
On 09 Aug 2008, at 9:53 PM, arthur wrote:
I think a really interesting project is the deploy module (http://drupal.org/project/deploy ) which I think is an interesting abstract approach to this issue. It won't meet everybody's needs, and isn't functionally sufficient yet (for content at least), but is a good start, and worth starting a more serious conversation of how this problem might be addressed from the larger Drupal community. Some of the concepts that Greg has already got in place are critical for the staging issue. You can check out a talk Greg did on the deploy module at the Seattle Drupal User Group here: http://blip.tv/file/1033300
Deploy uses macros, essentially. It captures form submissions and resubmits them to the other sites.
The entire drupal_execute / macro functionality was one of the tertiary goals of the initial design of fapi, unfortunately we didn't make the design water tight, so it was possible (easy in fact) to write forms which didn't work with macros. We would have to vet all the existing forms, and tighten up the API to force it to work consistently across all forms, to make deploy work in all cases. A good example of this not working is that cck 1.x forms. cck_import broke most of the time because the forms weren't built to work consistently with drupal_execute.
Additionally, deploy only works one way. Which doesn't help in cases like this (where content/configuration needs to be moved both directions). The moment you modify anything on the live site, you run into the problem where it's impossible to merge the changes back, without wiping the staging / dev database, and resubmitting everything.
Adrian- I think you're right that it isn't a perfect solution (as it currently stands). Beyond what you've mentioned, there are dependency chains for users, content and configurations that could get really really messy. That being said, I think it's pretty clear that many people in the Drupal community are really interested in a solution for this. A sense of agreement of how this problem could be tackled could really foster progress. Perhaps I'm being naive, but it does seem like some of the tools that various folks in the community are using are getting closer to being quite functional for lots of people. With some focused development time (a sprint perhaps?), it seems as though some of the more daunting pieces (reverting changes for example) might be approachable. Anyway, I was just hoping to bring some awareness to one of the tools that does exist now, and could potentially serve as a model to build from. Of course, I'm not even the author, so maybe Greg should chime in :) --------------------------------------------------- arthur@civicactions.com
On 09 Aug 2008, at 10:31 PM, arthur wrote:
That being said, I think it's pretty clear that many people in the Drupal community are really interested in a solution for this. A sense of agreement of how this problem could be tackled could really foster progress. Perhaps I'm being naive, but it does seem like some of the tools that various folks in the community are using are getting closer to being quite functional for lots of people. With some focused development time (a sprint perhaps?), it seems as though some of the more daunting pieces (reverting changes for example) might be approachable.
indeed, which is why we are having this discussion.
Anyway, I was just hoping to bring some awareness to one of the tools that does exist now, and could potentially serve as a model to build from. Of course, I'm not even the author, so maybe Greg should chime in :) If we intend to build from it, we need to identify the issues relating to it, and the work required to make it work perfectly, which is all I stated.
We need to inspect all the ways people are currently working around this, so we can choose the best mechanism to focus on.
I created the dbscripts module to assist me in many of these issues. The issues are: 1. Certain settings should be set differently when in a production and development environment (like caching) 2. Content data can be created, deleted and modified in both production and development 3. The database schema can change and be manipulated on the fly (ie: CCK) 4. The development process (and thusly, the recording changes process) must be fast and minimize hindering the development team Issue 1 can be solved for any configuration setting that is stored in the variables table, by setting it in the bottom of the settings.php file (read the bottom of that file for details). To expand on this, it would be great to give more flexibility to allow people to set configuration settings within a file, instead of the database (could this also improve performance?). Issues 2 and 3 are the primary issues. In issue 2, some people want the capability to merge more than just content, but certain configuration settings, too! Such as blocks, site name, etc. This is important to those who allow clients to manipulate what they can handle on the site, and hire them for the big features. Issue 3 is what makes merging this stuff hard. The dbscripts modules depends upon the usage of update.php and content_copy in order to merge content. Another database issue is that one row can hold different types of data - content, configuration and user data. For example, the users table stores configuration (UID, username), content (email address) and user data (last visited). Blocks hold both the content of the block, and the location of it. This makes it difficult to easily merge the changes made to these rows if they were both modified in development and production. Lastly, issue 4 is vitally important. You want to minimize repeating any steps. You don't want to have to test a configuration setting, then repeat all your steps while recording it. Development needs to be fast and quick, minimizing the time being used for version control. My dream is to have automatically created database migrations for Drupal. The database is being manipulated constantly, is there anything preventing us from recording those manipulations? If we could somehow record each change made to the database automatically in their original SQL queries and then export those changes to a set of incremental files. We could then run a script that would run through each of the migration scripts to update the database in your working space. Optimally, if there was automatic database migrations, the SQL queries would have to be specific. The query should target a specific column in a row, so then an editor in production could change the content of a block, while the developer could change the visibility settings. Effectively, the goal is that both production and development could manipulate the data on the same row while minimizing conflicts. This would allow a team to both move database changes "down" from production to development, and "up" from development to production with relative ease, even on large databases (which dbscripts can't handle). Is this technically possible with Drupal? Is the barrier just finding someone who will sit down and do it? If only I knew more about the interaction with the database layer, I would do it. However, my original goal when creating dbscripts was to do this, but I simply don't know enough about the database layer. --- Kathleen Murtagh
On Saturday 09 August 2008 15:43:35 Adrian Rossouw wrote:
On 09 Aug 2008, at 10:31 PM, arthur wrote:
That being said, I think it's pretty clear that many people in the Drupal community are really interested in a solution for this. A sense of agreement of how this problem could be tackled could really foster progress. Perhaps I'm being naive, but it does seem like some of the tools that various folks in the community are using are getting closer to being quite functional for lots of people. With some focused development time (a sprint perhaps?), it seems as though some of the more daunting pieces (reverting changes for example) might be approachable.
indeed, which is why we are having this discussion.
Anyway, I was just hoping to bring some awareness to one of the tools that does exist now, and could potentially serve as a model to build from. Of course, I'm not even the author, so maybe Greg should chime in :)
If we intend to build from it, we need to identify the issues relating to it, and the work required to make it work perfectly, which is all I stated.
We need to inspect all the ways people are currently working around this, so we can choose the best mechanism to focus on.
Given that heyrocker/Greg is now at Palantir, deploy is on our minds...let me try to not steal any of his thunder :) So we've talked a bit about deploy and this general problem of maintaining multiple different instances of a given site, where those instances may have different purposes/states within the overall workflow; allowing for data & changes to flow to/from/around those various different instances is the killer problem. I think we came up with a 'holy grail' that had dev state(s), then qa, staging, and prod. Primary keys are the big killer there, of course, and while I have considerable admiration for the folks who've come up with working models for ensuring no primary key conflicts (which afaik are either even/odd or start-at-key-X approaches), I'd hope there's agreement that we could do better. We've batted some thoughts around, but I'll leave it up to Greg on whether or not he feels like they're ripe for public digestion yet - this is his baby =) I will say, though, that there's a basic approach present in deploy which _does_ have the potential to become a systemic solution to this problem (IMO): it's pluggable. Instead of trying to figure out what data should or shouldn't be transferred (which is essentially what we're doing when we pick out parts of our db dumps to push/pull), it just presents an API and lets the modules implementing it decide the logic that ought to govern synchronizing data between various site states. That, to me, seems like the approach most likely to end with positive results: deploy abstracts synchronization complexities into a simple API. Modules implementing it just have to answer some fairly basic questions about how their data is structured, and deploy takes care of the rest. Sam
On 10 Aug 2008, at 1:22 AM, Sam Boyer wrote:
Modules implementing it just have to answer some fairly basic questions about how their data is structured, and deploy takes care of the rest.
Any solution we build has to be modular. If it's kept fairly simple, i expect that there's enough need that most of the important modules would get covered pretty easily.
When I created Deploy I had a set of goals in mind that I felt a solution had to meet. I think (hope) that most people will agree that these goals represent the ideal of what any staging and deployment system should achieve. Some of this will repeat some of what has already been said insightfully and thoughtfully above, but I kind of wanted it all put together in one place. I wanted to see a solution that focused on what can be done within Drupal, without modifying core. That includes not modifying the default installs of core tables (to reserve IDs for instance.) I see a lot of people focusing on the database when talking about deployment which I think is the wrong way to look at things. When you're moving a node, you should deal with it the same way the rest of Drupal does. load it, save it. The only thing missing there is how do you move it from one place to another. We should use that which Drupal gives us. Everything should be deployable - content, comments, taxonomy, users, configuration, content types, the whole nine yards. When "things" are deployed their references to other "things" should be preserved. It must work bi-directionally. If I make a change to my Pathauto config, I should be able to push it forward. Conversely if a user submits a new comment, that comment should be able to be pushed back to my staging site (and it should still be linked to the proper node.) Kathleen makes a good point that we need to account for things that are different between servers by design. Modules should be able to hook into the deployment process easily and with a great deal of flexibility (this is the part I think my module currently gets pretty right.) This framework could live in core or not, I really think it doesn't matter in the end. As recently saw with D6 & Views, any module with a sufficiently large enough install base is "core" in that people won't build sites without them and won't build modules without integrating with them. (Please note that I am not implying my module is at anywhere near the level of Views.) Build a deployment system that solves everyone's problems and it will get implemented throughout the spectrum. Any system should also be flexible enough to meet the needs of all users - that means there should be a point and click UI for non-developers to be able to deploy things, but also the ability to move the information into code for traditional deployment as Adrian wants. Things should be able to be deployed out of cron or published through a timetable. Different sites have different needs and with a flexible, sensible API this shouldn't be a problem. There is really only one thing standing in the way of all of this happening and that is the fact that we cannot uniquely identify most things between servers. I create a node on my dev server and it is given the ID of 10. It goes through my editorial workflow until its done and I push it out. However on my live site node 10 is a forum post that one of my users made. So my new node becomes node 11. Now I make a change to this node on my dev site and there's no way to push the update. The same goes for bringing forum node 10 back to my dev server. This is why all the id-specific hoop-jumping happens. The only reason Deploy works at all is because I'm pushing around things like CCK content types that can be uniquely identified through a text string (the user-specified machine name.) This whole problem would be much simpler if every "thing" had a GUID. Get that in place, and building a system to do deployment in the way described above actually becomes a realistic proposition. As Sam implied I've been conceptualizing a solution based on this premise, but its been hard finding time to really work on it. Hopefully soon. Having made that very long, rambling post (sorry its been a long day and its late) I think there's a lot of room for many solutions dependent on your needs, and its unlikely that any "one size fits all" answer will be forthcoming. Sites have different needs and different resources and different workflows and thus its natural different solutions will develop. People like Kathleen and Dave and the Bryght/Raincity folks and many others have done great work in this sphere and I'm sure will continue to. The more smart people are thinking about it and working on it, the more ideas and knowledge will be built which can only lead to good things down the road.
On 10 Aug 2008, at 9:18 AM, Greg Dunlap wrote:
This whole problem would be much simpler if every "thing" had a GUID. Get that in place, and building a system to do deployment in the way described above actually becomes a realistic proposition. As Sam implied I've been conceptualizing a solution based on this premise, but its been hard finding time to really work on it. Hopefully soon.
We could create an object table pretty simply, and just keep that maintained, with the object type, object area (ie: production, staging, etc) and reference id. And could then just write drivers for each of the object types. Most of the 'drivers' would be really simple, ie; just wrapping node_load and node_save or drupal_execute. We could make it a lot simpler to support the object table by writing a very simple fapi submit handler that gets added onto forms that need to be indexed in the oid table : $form['#submit']['drupal_object_register'] = array('node', 'nid', 'area'); // just tells it to use $form_values['nid'] and $form_values['area'] when sticking it in the table. We would also be able to handle versioning, although the idea of tracking node revisions like this on a busy site with revision tracking could get difficult. Add a datestamp on there, and you could have the proper order to import objects too.
This whole thread is exceptionally profound and of great interest to those who spend their days actually using Drupal with clients and getting sites done. I think it is correct to avoid a purely database driven analysis of the problem, and that both the problem and the solution should be expressed in Drupal terms, that is, Drupal building blocks, like users, nodes, comments, menus, blocks or panels, that is, different kinds of entities, their listers and containers. But the more profound the discussion goes, the more it resembles a reverse engineering of Drupal. And any approach to X-Ray a Drupal instance in order to dissect and reassemble it elsewhere, in whole or in part, unfortunately, is going to smack hard against design decisions already made in Drupal. Perhaps it would do well to focus attention on single points capable of being improved upon: there have been many attempts, there have been some satisfying solutions implemented, but there is no straightforward way to export and import node content in Drupal. Forget about "structural" nodes, for the moment (although that of course is important, I mean just to simplify). There is no straightforward way in Drupal to export and import a given content type's nodes. No off-the-shelf, simple way of doing it. I can try to marshall the failed export-import module. I can more successfully utilize the services module. But in this case if there are some complications in the field types, I need to write my own {node-type}.save service. That is, I cannot export/import content in Drupal without writing PHP code. I think that is one of the biggest weaknesses that confront us. Deploy seeks to solve this, but falls down through no fault of its own over the difficulty of exporting and cleanly importing CCK content type definitions. The solution probably exists in terms of extracting all objectively structural information architecture decisions (panels, menus, blocks) into text-based version-controllable definitions, together with configuration elements: currently "serializable" via the Installation Profile Generators: http://drupal.org/node/180078. I think a beginning would be to ensure that for Drupal 7 there existed an export/import facility for node types; and that CCK, Views, Panels and what-have-you work on improving their individual import/export. Anything on a grander scale might smack hard against existing Drupal design decisions. Victor Kane http://awebfactory.com.ar On Sun, Aug 10, 2008 at 5:15 AM, Adrian Rossouw <adrian@bryght.com> wrote:
On 10 Aug 2008, at 9:18 AM, Greg Dunlap wrote:
This whole problem would be much simpler if every "thing" had a GUID. Get that in place, and building a system to do deployment in the way described above actually becomes a realistic proposition. As Sam implied I've been conceptualizing a solution based on this premise, but its been hard finding time to really work on it. Hopefully soon.
We could create an object table pretty simply, and just keep that maintained, with the object type, object area (ie: production, staging, etc) and reference id. And could then just write drivers for each of the object types.
Most of the 'drivers' would be really simple, ie; just wrapping node_load and node_save or drupal_execute.
We could make it a lot simpler to support the object table by writing a very simple fapi submit handler that gets added onto forms that need to be indexed in the oid table :
$form['#submit']['drupal_object_register'] = array('node', 'nid', 'area'); // just tells it to use $form_values['nid'] and $form_values['area'] when sticking it in the table.
We would also be able to handle versioning, although the idea of tracking node revisions like this on a busy site with revision tracking could get difficult. Add a datestamp on there, and you could have the proper order to import objects too.
With the help of a GUID and a timestamp of the last write access in every table it would be easy to synchronize Databases. If you want to synchronize only certain elements like varialbes, nodes, users, configuration etc. modules could provide a hook which - just lists the db table to snychronize - gets the result of db compaire and let the user decide what will be pushed to the other side. Both things could be handled by the db_query() function, to make sure no one forgets to touch the access timestamp and creates accurate GUID's.
On 10 Aug 2008, at 3:19 PM, Ernst Plüss wrote:
With the help of a GUID and a timestamp of the last write access in every table it would be easy to synchronize Databases. If you want to synchronize only certain elements like varialbes, nodes, users, configuration etc. modules could provide a hook which Not all tables have timestamps, and it would require adding a whole mess of extra columns. just lists the db table to snychronize gets the result of db compaire and let the user decide what will be pushed to the other side. That's a lot of functionality, considering the weird merging of tables, and additional foreign keys. Take for example a simple single -> multi relationship. You'd have to compare the bunch of them, and understand how to generate them all.
All this stuff is already in most of the _load functions. It's also a LOT easier to compare 2 objects to each other (take the diff module for instance). Also, the db updates then get related very directly to the schema the modules export.
Both things could be handled by the db_query() function, to make sure no one forgets to touch the access timestamp and creates accurate GUID's.
I think that perhaps db_query is too low a level for this, as that would add extra cycles on every single query done to the database, and probably also parsing the query (or changing all the db_query calls in core with extra parameters). And it's also not that hard to make sure the access stamp isn't changed, and is only related to nodes.
2008/8/10 Adrian Rossouw <adrian@bryght.com>
On 10 Aug 2008, at 3:19 PM, Ernst Plüss wrote:
With the help of a GUID and a timestamp of the last write access in every table it would be easy to synchronize Databases. If you want to synchronize only certain elements like varialbes, nodes, users, configuration etc. modules could provide a hook which
Not all tables have timestamps, and it would require adding a whole mess of extra columns.
Adding Timestamp columns would not be to hard. For MySql you don't even have to change the module code since the DB does everything for you. Up on every write operation a new timestamp is set automatically. The GUID would be a little bit harder. AFAIK there's no automatic insertin of GUIDs for primary keys. But the functionality could be integrated in the db_xyz functions. - just lists the db table to snychronize - gets the result of db compaire and let the user decide what will be pushed to the other side. That's a lot of functionality, considering the weird merging of tables, and
additional foreign keys. Take for examplea simple single -> multi relationship. You'd have to compare the bunch of them, and understand how to generate them all.
For new and updated rows it's actually very easy. First check for the timestamp. Then check for the GUID. If it's there update the row from the source db if not insert it. This works always and is not dependant on the relationships between the tables. The hard things are the deleted rows. You have to check for every GUID in the destination db whether it still exists in the source db. If you have a lot of data this can take a some time. The hook is meant as a help for modul developers in case you don't want to migrate a whole db but you let's say a certain node type. So you can do some filtering or what every makes sense.
All this stuff is already in most of the _load functions. It's also a LOT easier to compare 2 objects to each other (take the diff module for instance).
Also, the db updates then get related very directly to the schema the modules export.
Both things could be handled by the db_query() function, to make sure no one forgets to touch the access timestamp and creates accurate GUID's.
I think that perhaps db_query is too low a level for this, as that would add extra cycles on every single query done to the database, and probably also parsing the query (or changing all the db_query calls in core with extra parameters).
And it's also not that hard to make sure the access stamp isn't changed, and is only related to nodes.
On Sun, Aug 10, 2008 at 6:06 AM, Victor Kane <victorkane@gmail.com> wrote: The solution probably exists in terms of extracting all objectively
structural information architecture decisions (panels, menus, blocks) into text-based version-controllable definitions, together with configuration elements: currently "serializable" via the Installation Profile Generators: http://drupal.org/node/180078.
I think a beginning would be to ensure that for Drupal 7 there existed an export/import facility for node types; and that CCK, Views, Panels and what-have-you work on improving their individual import/export.
Alex Barth recently contact me about exactly this, having come to the exact same conclusion. He has opened an issue in deploy's issue queue: http://drupal.org/node/291921 He himself wrote a module quite similar to Deploy called Ports which is pretty interesting and abstracts import/export a bit beyond what I did. Maybe he can chime in here about what he's done because I've only taken really a cursory glance at it. He also pointed out to me that there are many admin/config level problems that are currently being worked on individually which could probably help from some communication. For instance, having well-defined import/export functionality for all "things" in Drupal would help not only deployment but install profiles, and probably a bunch of other admin/config-related issues. I'd love to see everyone discussing their problems in a more public setting, not here but perhaps in the Change Management group on g.d.o., so we can pool resources on problems when it makes sense. For all that people seem to be struggling with these issues, that group is oddly dead. For more food for thought, there is a ton of interesting discussion in the comments on my initial blog post about Deploy: http://heyrocker.com/drupal/content/deployment-and-change-management-framewo...
On Aug 10, 2008, at 12:22 PM, Greg Dunlap wrote:
On Sun, Aug 10, 2008 at 6:06 AM, Victor Kane <victorkane@gmail.com> wrote:
Alex Barth recently contact me about exactly this, having come to the exact same conclusion. He has opened an issue in deploy's issue queue:
He himself wrote a module quite similar to Deploy called Ports which is pretty interesting and abstracts import/export a bit beyond what I did. Maybe he can chime in here about what he's done because I've only taken really a cursory glance at it.
I'm happy that this discussion is coming up. I started writing port module for generating installer profiles and was then pleasantly surprised to see similarities with Greg's deploy module. At port module's core there is a very simple idea: provide a hook to let modules define a matching export and import function. Here is the ports hook definition for imagecache for example: // Implementation of hook_ports() function imagecache_ports() { $ports = array(); $ports['imagecache_presets']['name'] = t('Image cache presets'); $ports['imagecache_presets']['export callback'] = 'imagecache_export_presets'; $ports['imagecache_presets']['import callback'] = 'imagecache_import_presets'; $ports['imagecache_presets']['type'] = PORT_STRUCTURE; // not yet implemented $ports['imagecache_presets']['version'] = '1.68.2.3'; // not yet implemented return $ports; } This simple approach is pretty powerful, because it makes it really easy to generate a single export array that contains all the information for importing it on a target site. In its simplest applications, you can copy / paste this array to your target site or you can use it to generate install functions for an install profile (both operations currently supported by port module). After doing a first proof of concept as a result of a client project, I saw how similar certain approaches in deploy module are and got in touch with Greg. I had another look at deploy module this weekend: I'm actually thinking that a deploy like module and a port like module could play very well together by port providing the structure in which modules should export and import configuration and deploy providing the XML-RPC integration, deployment functionality and the UI. Another module that could be implemented on top of port module is the install profile wizard (recent discussion on a related thread: http://drupal.org/node/230059#comment-957328) . That said, port is a proof of concept, this is the current status and limitations: # Basic hook_ports() system defined # UI for copy/paste deployment of export data # UI for generating callable PHP functions # on-behalf implementations for menus, user roles and permissions, module status, content types, imagecache, nodeprofile and spaces. # while there is a flag for PORT_STRUCTURE and PORT_CONTENT I haven't thought deeper about content export/import - I'm just thinking that it's a very closely related problem # while there's a slot for version number, port isn't handling any version comparisons atm # there is no 'update' handling, no notion of mapping # there is no way of exporting parts of a modules configuration yet. While modules can define more than one export / import pair, there's no way of exporting just a part of one export function. e. g. export only certain imagecache definitions, not all of them. this one should be easy. Ideas for update handling in port: I'm leaning towards not dealing with it on this level. In the existing implementation modules would deal with create new vs update by themselves. I haven't thought much about this though and there might be a smart helper that port could provide so that we know on this level what's an update and what's new. Plans for port: don't know yet. I'm in touch with Greg on deploy and I plan to get in touch with Boris Mann on install profile wizards. These are the two projects where I see overlap. I'd be very curious to get your thoughts on port module's approach. Check out the code here: http://cvs.drupal.org/viewvc.py/drupal/contributions/sandbox/alex_b/port/ - Alex Barth http://www.developmentseed.org/blog tel (202) 250-3633
ceardach has some awesome scripts for deployment. http://ceardach.com/blog/tags/deployment http://drupal.org/project/dbscripts I haven't tried them yet. But figured I'd introduce them to this conversation. Margie
On Monday 11 August 2008 10:12:18 Marjorie Roswell wrote:
ceardach has some awesome scripts for deployment.
http://ceardach.com/blog/tags/deployment http://drupal.org/project/dbscripts
I haven't tried them yet. But figured I'd introduce them to this conversation.
Margie
She introduced them herself on Saturday =)
On 11 Aug 2008, at 5:12 PM, Marjorie Roswell wrote:
ceardach has some awesome scripts for deployment.
http://ceardach.com/blog/tags/deployment http://drupal.org/project/dbscripts
One of the things i like about the 64 bit uid discussion going on in parallel, is it removes a lot of need for the primary key trickery. you can be sure that stuff you import/export has a unique key, and they don't have to be handled in contiguous blocks like those scripts do it.
On Mon, Aug 11, 2008 at 8:42 AM, Adrian Rossouw <adrian@bryght.com> wrote:
One of the things i like about the 64 bit uid discussion going on in parallel, is it removes a lot of need for the primary key trickery.
How complicated would it be to move from sequential ids to UUIDs? This is something I care about more for aesthetic than technical reasons. I'd like nodes/users/etc. to be less obviously sequential (hey, look at that UID, what a n00b). -- Katherine Senzee (ksenzee) esquaredworkshops.com
It would be extremely difficult. For instance: - Upgrading an existing site would be virtually impossible. Every id would change, plus you would have to update every foreign key reference to every id throughout core AND contrib. - For sites not running pathauto, every URL would change. That's just a couple of the purely technical reasons. I would also add that a conscious decision was made to move away from PHP-generated IDs in Drupal 6, and it seems to me very unlikely that this will be reversed. There are several issues around this in the queues. It would offer us up some great functionality but I just don't see it happening. Also I like knowing who the n00bs are :P (Hi, my name is Greg Dunlap and I'm a n00b.) On Mon, Aug 11, 2008 at 12:43 PM, Katherine Senzee < katherine@esquaredworkshops.com> wrote:
On Mon, Aug 11, 2008 at 8:42 AM, Adrian Rossouw <adrian@bryght.com> wrote:
One of the things i like about the 64 bit uid discussion going on in parallel, is it removes a lot of need for the primary key trickery.
How complicated would it be to move from sequential ids to UUIDs? This is something I care about more for aesthetic than technical reasons. I'd like nodes/users/etc. to be less obviously sequential (hey, look at that UID, what a n00b).
-- Katherine Senzee (ksenzee) esquaredworkshops.com
On 11 Aug 2008, at 10:25 PM, Greg Dunlap wrote:
- Upgrading an existing site would be virtually impossible. Every id would change, plus you would have to update every foreign key reference to every id throughout core AND contrib. Yes, we'd need foreign key support in schema api, but that is something I think we should end up doing anyway. (and no, not in a way that slows down the work in db api for 7).
- For sites not running pathauto, every URL would change. Sites not running pathauto would have aliases created for the existing urls.
That's just a couple of the purely technical reasons. I would also add that a conscious decision was made to move away from PHP- generated IDs in Drupal 6, and it seems to me very unlikely that this will be reversed. There are several issues around this in the queues. It would offer us up some great functionality but I just don't see it happening. If it needs to stay in the db, it's possible to use mysql and pgsql functions to create a next value. That would of course cut us of from sqllite.
I'm not entirely wedded to the idea, and even if it's supported, i foresee it being an option and not the default mechanism for doing things, for at least a release or two, but it could help us solve a few very interesting problems down the line, apart from just the import export problem. I do however think we should look at extending D6's import/export functionality through contrib, as it will help us pick up issues in core that can be helped along with core api changes, and makes the functionality available to site developers sooner rather than later. (ie: let's not boil the ocean)
Reading through this, I got the sense that not everyone is talking about the same thing. Something that may be useful is to break the problem down into the various areas of staging that need to be accounted for. In Drupal, you really need to stage about 5 types of things (this list can grow a bit): 1) Themes - changes to a theme should not be made on a live server 2) Modules - esp. if you are developing a custom module, it should not be made on a production server. 3) Content - content CAN be staged on a production server with Drupal's publish and workflow features. 4) Dynamic Content - dynamic content cannot really be staged on a production server, from the standpoint that you are not going to see the different permutations of blocks while information is not published. 5) Users - if you are setting up a bunch of new users, staging them on a production server is not a good idea. There are some strategies I have used in the past to accomplish all of these things. Each solution was specific to a client. Short description follows, and would love to hear everyone's feedback. 1) Staging themes / moving to production: we had a client who needed to alter the layout to specific pages in their site regularly for a 24 hour period (then change everything back). We set up a staging server just for these takeovers. It had a physical reproduction of Drupal and the production database. The client would copy the theme under a different name, make adjustments to the CSS, and rsync it over to the production servers through a simple script. Simply changing the theme was not a good idea for their purposes, they had a lot of blocks that were displaying on different pages in the application. We wrote a small module that duplicated records in the block table for the new theme, then implemented it. We also added some logic for doing this based on time of day. 2) Staging custom modules: the hardest part about staging custom modules is ensuring they work with live data. Selenium is a great tool for writing short functional tests, we did this extensively on a couple of sites and trained the client to construct and perform automated functional tests each time they pushed things into production. 3) Content Staging - One of the things people don't always understand about staging content is that there are ample workflow controls in Drupal to allow users to submit content for editing and review prior to publication. We have used workflow + views to create queues for content that needs to be reviewed, controlled access to those queues based on user role, and used views to track overall workflow statistics based on the time things were published. This has been effective for everything from small non-profits to large publishing sites. 4) Dynamic Content - Sometimes editors want to see the impact content will have on dynamically generated pages before publication. This comes up especially when we are using panels. Have not found a good overall solution and tend to work things out with clients on a case by case basis. Some strategies include writing small modules to export / import nodes after publication, using web services to grab nodes that have been published on a staging site, and cut and paste. 5) Use Staging - Occasionally, we have to import large numbers of users on a regular basis. Staging users is always a tricky issue, even when we are using open authentication standards like openid. The issue is typically getting the permissions right, it can become hard to use Drupal's user admin interface when we are talking about 5000 non- alphabetical user records. Typically, we write a script to do the import and modify user permissions. This does not deal with moving code from dev into staging, but really is just about that QA that needs to happen before moving things into production. M On Aug 11, 2008, at 10:56 AM, Alex Barth wrote:
On Aug 10, 2008, at 12:22 PM, Greg Dunlap wrote:
On Sun, Aug 10, 2008 at 6:06 AM, Victor Kane <victorkane@gmail.com> wrote:
Alex Barth recently contact me about exactly this, having come to the exact same conclusion. He has opened an issue in deploy's issue queue:
He himself wrote a module quite similar to Deploy called Ports which is pretty interesting and abstracts import/export a bit beyond what I did. Maybe he can chime in here about what he's done because I've only taken really a cursory glance at it.
I'm happy that this discussion is coming up.
I started writing port module for generating installer profiles and was then pleasantly surprised to see similarities with Greg's deploy module.
At port module's core there is a very simple idea: provide a hook to let modules define a matching export and import function. Here is the ports hook definition for imagecache for example:
// Implementation of hook_ports() function imagecache_ports() { $ports = array(); $ports['imagecache_presets']['name'] = t('Image cache presets'); $ports['imagecache_presets']['export callback'] = 'imagecache_export_presets'; $ports['imagecache_presets']['import callback'] = 'imagecache_import_presets'; $ports['imagecache_presets']['type'] = PORT_STRUCTURE; // not yet implemented $ports['imagecache_presets']['version'] = '1.68.2.3'; // not yet implemented return $ports; }
This simple approach is pretty powerful, because it makes it really easy to generate a single export array that contains all the information for importing it on a target site. In its simplest applications, you can copy / paste this array to your target site or you can use it to generate install functions for an install profile (both operations currently supported by port module).
After doing a first proof of concept as a result of a client project, I saw how similar certain approaches in deploy module are and got in touch with Greg. I had another look at deploy module this weekend: I'm actually thinking that a deploy like module and a port like module could play very well together by port providing the structure in which modules should export and import configuration and deploy providing the XML-RPC integration, deployment functionality and the UI.
Another module that could be implemented on top of port module is the install profile wizard (recent discussion on a related thread: http://drupal.org/node/230059#comment-957328) .
That said, port is a proof of concept, this is the current status and limitations:
# Basic hook_ports() system defined # UI for copy/paste deployment of export data # UI for generating callable PHP functions # on-behalf implementations for menus, user roles and permissions, module status, content types, imagecache, nodeprofile and spaces. # while there is a flag for PORT_STRUCTURE and PORT_CONTENT I haven't thought deeper about content export/import - I'm just thinking that it's a very closely related problem # while there's a slot for version number, port isn't handling any version comparisons atm # there is no 'update' handling, no notion of mapping # there is no way of exporting parts of a modules configuration yet. While modules can define more than one export / import pair, there's no way of exporting just a part of one export function. e. g. export only certain imagecache definitions, not all of them. this one should be easy.
Ideas for update handling in port: I'm leaning towards not dealing with it on this level. In the existing implementation modules would deal with create new vs update by themselves. I haven't thought much about this though and there might be a smart helper that port could provide so that we know on this level what's an update and what's new.
Plans for port: don't know yet. I'm in touch with Greg on deploy and I plan to get in touch with Boris Mann on install profile wizards. These are the two projects where I see overlap.
I'd be very curious to get your thoughts on port module's approach.
Check out the code here: http://cvs.drupal.org/viewvc.py/drupal/contributions/sandbox/alex_b/port/
-
Alex Barth http://www.developmentseed.org/blog tel (202) 250-3633
On 11 Aug 2008, at 6:35 PM, Michael Haggerty wrote:
1) Themes - changes to a theme should not be made on a live server
2) Modules - esp. if you are developing a custom module, it should not be made on a production server.
I group these into 'packages'. they are even stored in the same table.
Yes Heyrocker, yes! I would love to see these issues being discussed out in the open on the g.d.o. change management group! Everybody to come over there! Seriously though, the sheer amount of people who are striving for a solution to this problem have hit critical mass this year. The very email thread you are reading confirms it. It's not a problem that any one of us, nor any single shop will be able to adequately solve simply because of the sheer amount of use cases that seem to crop up every time a developer cries, Eureka!" or " I have a module in the works for that." Let's begin discussions in earnest on the g.d.o. group or we will see six contrib modules hit the streets at about the same time that are all 90% cool. My thread on the proposed hook_configuration() is even starting to lean towards a change management scenario. It's time, people, it's time. -- Joel Farris "I'm not concerned about all hell breaking loose, but that a PART of hell will break loose... it'll be much harder to detect." ~ George Carlin On Aug 10, 2008, at 9:22 AM, Greg Dunlap wrote:
I'd love to see everyone discussing their problems in a more public setting, not here but perhaps in the Change Management group on g.d.o., so we can pool resources on problems when it makes sense. For all that people seem to be struggling with these issues, that group is oddly dead.
This whole problem would be much simpler if every "thing" had a GUID. Get that in place, and building a system to do deployment in the way described above actually becomes a realistic proposition. As Sam implied I've been conceptualizing a solution based on this premise, but its been hard finding time to really work on it. Hopefully soon.
Just a quick point on this: if every "thing" had a GUID (that is, a globally unique identifier (PHP's uniqid is sufficient if it the prefix parameter is invoked in order to identify the host machine's MAC, mayble butressed with an addional field based on an aleatory number, since otherwise duplicate timestamp based id's will be created), AND if every "thing" were serializable / unserializable (and here we mustn't forget certain encoding problems). Serializable: take an in memory object and turn it into a text object that can be stored in a file, version controlled, deployed via unserialization, etc.). Then you would have a realistic proposition. Certainly achievable for nodes, perhaps lots of other stuff, but I don't know if every "thing" in Drupal required for deployment, site merging, etc. Just wanted to extende thoughts in that direction.
On Sunday 10 August 2008 08:17:15 Victor Kane wrote:
This whole problem would be much simpler if every "thing" had a GUID. Get that in place, and building a system to do deployment in the way described above actually becomes a realistic proposition. As Sam implied I've been conceptualizing a solution based on this premise, but its been hard finding time to really work on it. Hopefully soon.
Just a quick point on this: if every "thing" had a GUID (that is, a globally unique identifier (PHP's uniqid is sufficient if it the prefix parameter is invoked in order to identify the host machine's MAC, mayble butressed with an addional field based on an aleatory number, since otherwise duplicate timestamp based id's will be created), AND if every "thing" were serializable / unserializable (and here we mustn't forget certain encoding problems). Serializable: take an in memory object and turn it into a text object that can be stored in a file, version controlled, deployed via unserialization, etc.). Then you would have a realistic proposition.
Certainly achievable for nodes, perhaps lots of other stuff, but I don't know if every "thing" in Drupal required for deployment, site merging, etc.
Just wanted to extende thoughts in that direction.
So if I've missed this in your/others' thoughts Victor, please forgive me, but...why are we talking about serializing and unserializing data? As you pointed out in your first email:
There is no straightforward way in Drupal to export and import a given content type's nodes.
No off-the-shelf, simple way of doing it.
I agree. So...why talk about it at all? Deploy uses that approach at present, but we needn't be wedded to it. It seems to me that the GUID-based system is the thinnest possible way of allowing Deploy to keep track of the fact that "Thing A" on server 1 is "Thing B" on server 2. Deploy's API shouldn't be aware of what the Things are - just that they need to be checked against each other. And then it's up to the module(s) that created (or altered) the Things to decide what data comprises them, how to check that data against each other on the servers, and then how to resolve differences. The two biggest problems with that are handling data changes which have dependency chains (node system type changes being the most obvious) and that it could potentially make for an API too complex for general consumption. But that seems less insurmountable to me than the import/export problem. Unless I've missed something? Sam
The serialization and unserialization of data is included in my approach to the problem for the purposes of the independent transmission of nodes from one system to another, as in the case of one Drupal site availing itself of a node.save service on another Drupal site. It also has the purpose of guaranteeing insofar as is possible a text version of all entities, configurations, including exported views, content types, panels, hopefully menues and module configurations and exported variables, for the purposes of continued version control and hence deployment also (serialization to text, unserialization to deployment objective). Here of course, serialization and unserialization is not meant in the php function sense, and could include marshaling and unmarshaling to and from XML, and is a cross-language concept. Victor Kane http://awebfactory.com.ar On Sun, Aug 10, 2008 at 8:32 PM, Sam Boyer <drupal@samboyer.org> wrote:
On Sunday 10 August 2008 08:17:15 Victor Kane wrote:
This whole problem would be much simpler if every "thing" had a GUID. Get that in place, and building a system to do deployment in the way described above actually becomes a realistic proposition. As Sam implied I've been conceptualizing a solution based on this premise, but its been hard finding time to really work on it. Hopefully soon.
Just a quick point on this: if every "thing" had a GUID (that is, a globally unique identifier (PHP's uniqid is sufficient if it the prefix parameter is invoked in order to identify the host machine's MAC, mayble butressed with an addional field based on an aleatory number, since otherwise duplicate timestamp based id's will be created), AND if every "thing" were serializable / unserializable (and here we mustn't forget certain encoding problems). Serializable: take an in memory object and turn it into a text object that can be stored in a file, version controlled, deployed via unserialization, etc.). Then you would have a realistic proposition.
Certainly achievable for nodes, perhaps lots of other stuff, but I don't know if every "thing" in Drupal required for deployment, site merging, etc.
Just wanted to extende thoughts in that direction.
So if I've missed this in your/others' thoughts Victor, please forgive me, but...why are we talking about serializing and unserializing data? As you pointed out in your first email:
There is no straightforward way in Drupal to export and import a given content type's nodes.
No off-the-shelf, simple way of doing it.
I agree. So...why talk about it at all? Deploy uses that approach at present, but we needn't be wedded to it. It seems to me that the GUID-based system is the thinnest possible way of allowing Deploy to keep track of the fact that "Thing A" on server 1 is "Thing B" on server 2. Deploy's API shouldn't be aware of what the Things are - just that they need to be checked against each other. And then it's up to the module(s) that created (or altered) the Things to decide what data comprises them, how to check that data against each other on the servers, and then how to resolve differences. The two biggest problems with that are handling data changes which have dependency chains (node system type changes being the most obvious) and that it could potentially make for an API too complex for general consumption. But that seems less insurmountable to me than the import/export problem.
Unless I've missed something?
Sam
On Monday 11 August 2008 05:33:49 Victor Kane wrote:
The serialization and unserialization of data is included in my approach to the problem for the purposes of the independent transmission of nodes from one system to another, as in the case of one Drupal site availing itself of a node.save service on another Drupal site.
It also has the purpose of guaranteeing insofar as is possible a text version of all entities, configurations, including exported views, content types, panels, hopefully menues and module configurations and exported variables, for the purposes of continued version control and hence deployment also (serialization to text, unserialization to deployment objective).
Here of course, serialization and unserialization is not meant in the php function sense, and could include marshaling and unmarshaling to and from XML, and is a cross-language concept.
Victor Kane http://awebfactory.com.ar
So my initial reaction was that this was actually a disadvantage - it seemed to introduce an extra layer of unnecessary complexity, as it requires pulling the data out of the db, coming up with a new storage format, then transferring that format and reintegrating it into another db. The backup and project-level revisioning control implications are interesting - but that's a wholly different axis from the crux of the deployment paradigm, where there's _one_ version. However, on further reflection, I can see there being some strong arguments in either direction. Your approach, Victor, makes me drift back to the recent vcs thread on this list, as I can't imagine such a system being feasible and really scalable without the use of something like (gasp!) git. Two basic reasons for that: 1. Data integrity assurance: there's nothing like a SHA1 hash to ensure that nothing gets corrupted in all the combining/recombining of data through and around various servers. And then there's the whole content-addressable filesystem bit - I'd conjecture that it would make git exceptionally proficient at handling either database dumps or tons of data structured from 'export' functionality, whichever the case may be. I imagine Kathleen might be able to speak more to that. 2. Security and logic (and speed): if run on a git backend, I'd speculate that we could use project-specific ssh keys to encrypt all the synchronizations (although that obviously brings up a host of other potential requirements/issues). On the logic end, we could build on top of git's system for organizing sets of commits to allow for different 'types' of syncs (i.e., you're working with different datasets when doing dev <=> qa vs. when you're working with live <=> staging). As for speed...well, I'll just call that a hunch =) This approach would require a _lot_ of coding, though. The more immediate-term solution that still makes the most sense to me is one where we let modules work straight from the data as it exists in the db, and define some logic for handling synchronizations that the deploy API can manage. But if all the systems are to be implemented...well, then it probably means a pluggable Deploy API that allows for different subsystems to handle different segments of the overall process. Sam
Your points are interestng, but I think there may be a lot to what Greg Dunlap is recommending in terms of thinking in Drupal logic terms and not database terms. The image I have in my mind is that the database is kind of a two-dimensional projection of three dimensions; that is, there may be many hidden relationships in the process side of things that have to be taken into account for deployment, especially since the database is usually considered in purely MySql terms, that is, with fully transparent relationships between tables. But given concrete client driven circumstances, of course, in a given instance with a given set of priorities, I can easily see how what you are saying could make sense. Victor On Mon, Aug 11, 2008 at 3:51 PM, Sam Boyer <drupal@samboyer.org> wrote:
On Monday 11 August 2008 05:33:49 Victor Kane wrote:
The serialization and unserialization of data is included in my approach to the problem for the purposes of the independent transmission of nodes from one system to another, as in the case of one Drupal site availing itself of a node.save service on another Drupal site.
It also has the purpose of guaranteeing insofar as is possible a text version of all entities, configurations, including exported views, content types, panels, hopefully menues and module configurations and exported variables, for the purposes of continued version control and hence deployment also (serialization to text, unserialization to deployment objective).
Here of course, serialization and unserialization is not meant in the php function sense, and could include marshaling and unmarshaling to and from XML, and is a cross-language concept.
Victor Kane http://awebfactory.com.ar
So my initial reaction was that this was actually a disadvantage - it seemed to introduce an extra layer of unnecessary complexity, as it requires pulling the data out of the db, coming up with a new storage format, then transferring that format and reintegrating it into another db. The backup and project-level revisioning control implications are interesting - but that's a wholly different axis from the crux of the deployment paradigm, where there's _one_ version.
However, on further reflection, I can see there being some strong arguments in either direction. Your approach, Victor, makes me drift back to the recent vcs thread on this list, as I can't imagine such a system being feasible and really scalable without the use of something like (gasp!) git. Two basic reasons for that:
1. Data integrity assurance: there's nothing like a SHA1 hash to ensure that nothing gets corrupted in all the combining/recombining of data through and around various servers. And then there's the whole content-addressable filesystem bit - I'd conjecture that it would make git exceptionally proficient at handling either database dumps or tons of data structured from 'export' functionality, whichever the case may be. I imagine Kathleen might be able to speak more to that.
2. Security and logic (and speed): if run on a git backend, I'd speculate that we could use project-specific ssh keys to encrypt all the synchronizations (although that obviously brings up a host of other potential requirements/issues). On the logic end, we could build on top of git's system for organizing sets of commits to allow for different 'types' of syncs (i.e., you're working with different datasets when doing dev <=> qa vs. when you're working with live <=> staging). As for speed...well, I'll just call that a hunch =)
This approach would require a _lot_ of coding, though. The more immediate-term solution that still makes the most sense to me is one where we let modules work straight from the data as it exists in the db, and define some logic for handling synchronizations that the deploy API can manage. But if all the systems are to be implemented...well, then it probably means a pluggable Deploy API that allows for different subsystems to handle different segments of the overall process.
Sam
On Monday 11 August 2008 14:38:27 Victor Kane wrote:
Your points are interestng, but I think there may be a lot to what Greg Dunlap is recommending in terms of thinking in Drupal logic terms and not database terms.
The image I have in my mind is that the database is kind of a two-dimensional projection of three dimensions; that is, there may be many hidden relationships in the process side of things that have to be taken into account for deployment, especially since the database is usually considered in purely MySql terms, that is, with fully transparent relationships between tables.
But given concrete client driven circumstances, of course, in a given instance with a given set of priorities, I can easily see how what you are saying could make sense.
Victor
At this point then, I suspect we're talking past each other. First, are you referring only to the direct-database sync point that I made, or is that also in reference to the drupal-object exporting, git-managed system? If it's the latter as well as the former, then we've _really_ diverged. Just because I'm suggesting that there are direct-database syncing solutions possible does NOT mean that I'm not thinking in Drupal logic terms. Let me try a different, potentially clearer way: In your approach, it seems to me that the goal is to use existing drupal API functions - let's use nodes, so node_load() - to fully build a node object. That object is then exported to code, at which point we can do things with vcs, etc. During a deployment, we parse in that code and utilize the API counterpart to our data-getter function, so node_save(), to push the information into the database. Before node_save() is actually run, however, we can use something like a GUID system to change primary keys as necessary so that we've got the node from server A going into the right associated node on server B. As I said, I can see arguments for that, particularly if it utilizes a git backend. And only differs from what I'm proposing in a few ways. What I'm suggesting stems from Greg's original point about a GUID for every 'thing'. 'Things' being, to use your words: On Monday 11 August 2008 05:33:49 Victor Kane wrote:
... all entities, configurations, including exported views, content types, panels, hopefully menues and module configurations and exported variables ...
Because I agree completely that the _only_ way to arrive at a solution that works for drupal is to think in terms of the 'things' (I'm going to say 'items' from here on out) it makes - not to try to just grab bits of data from here and there and hope it all lines up on the other end. I've always thought that, and am pretty sure I always will. My proposal is for a deploy API that would let modules define what these items are, and then define a set of behaviors for managing the deployment of those items across a variety of different circumstances. For our concrete example, I suspect the _first_ thing I'd do in the synchro handler for nodes is to call node_load(), then follow it up with some internal logic that...I dunno, there are a lot of ways we could go from there. It could dynamically construct a delta from the last sync time with the remote server; it could just fire over the whole node object. If extension modules need to do something that the node module's deploy handler couldn't work out, no problem - those modules just need to register their interest in deploy transactions related to the GUID for that particular item. On the receiving end, the node module's deploy API implementation knows what to expect coming through the pipe and handles it accordingly - maybe through node_save(), maybe not. The advantage here, as I see it, is the potential to drill down _very_ quickly on exactly what should be checked during a given changeset. Very quickly as in, potentially, a single query. I can't picture the schema for the deploy items table, so it may take more, but it could be as simple as a single SELECT query that grabs all the items which have been changed/created since the last deploy txn, and that that particular deploy txn is interested in (again, a dev <=> qa txn != staging <=> live txn), and then it's a simple question of iterating through modules that have something to say about how each of the items is deployed. As I said, I can see ways that a git-driven system can probably provide similar speed when it comes to drilling down to what items need to be considered in a given txn; also, a git-driven system has the added benefit of being able to, even when your local system offline, still provide the entire version history on demand for _each server_ you've ever connected with on that project. Well, assuming your remote git branches are up to date. As far as I can tell, this is really the kind of thing you're talking about when you say:
...the database is kind of a two-dimensional projection of three dimensions that is, there may be many hidden relationships in the process side of things that have to be taken into account for deployment...
I can think of two very different ways of interpreting that metaphor, both of which are applicable to the topic at hand. I'm hoping, though, that this explanation finally does make clear that I'm _not_ thinking along the lines of 'how do we make an sqldump better?', but instead about methods for making deployment a process that's as smart about drupal data as drupal itself is. s
On Mon, Aug 11, 2008 at 3:51 PM, Sam Boyer <drupal@samboyer.org> wrote:
On Monday 11 August 2008 05:33:49 Victor Kane wrote:
The serialization and unserialization of data is included in my approach to the problem for the purposes of the independent transmission of nodes from one system to another, as in the case of one Drupal site availing itself of a node.save service on another Drupal site.
It also has the purpose of guaranteeing insofar as is possible a text version of all entities, configurations, including exported views, content types, panels, hopefully menues and module configurations and exported variables, for the purposes of continued version control and hence deployment also (serialization to text, unserialization to deployment objective).
Here of course, serialization and unserialization is not meant in the php function sense, and could include marshaling and unmarshaling to and from XML, and is a cross-language concept.
Victor Kane http://awebfactory.com.ar
So my initial reaction was that this was actually a disadvantage - it seemed to introduce an extra layer of unnecessary complexity, as it requires pulling the data out of the db, coming up with a new storage format, then transferring that format and reintegrating it into another db. The backup and project-level revisioning control implications are interesting - but that's a wholly different axis from the crux of the deployment paradigm, where there's _one_ version.
However, on further reflection, I can see there being some strong arguments in either direction. Your approach, Victor, makes me drift back to the recent vcs thread on this list, as I can't imagine such a system being feasible and really scalable without the use of something like (gasp!) git. Two basic reasons for that:
1. Data integrity assurance: there's nothing like a SHA1 hash to ensure that nothing gets corrupted in all the combining/recombining of data through and around various servers. And then there's the whole content-addressable filesystem bit - I'd conjecture that it would make git exceptionally proficient at handling either database dumps or tons of data structured from 'export' functionality, whichever the case may be. I imagine Kathleen might be able to speak more to that.
2. Security and logic (and speed): if run on a git backend, I'd speculate that we could use project-specific ssh keys to encrypt all the synchronizations (although that obviously brings up a host of other potential requirements/issues). On the logic end, we could build on top of git's system for organizing sets of commits to allow for different 'types' of syncs (i.e., you're working with different datasets when doing dev <=> qa vs. when you're working with live <=> staging). As for speed...well, I'll just call that a hunch =)
This approach would require a _lot_ of coding, though. The more immediate-term solution that still makes the most sense to me is one where we let modules work straight from the data as it exists in the db, and define some logic for handling synchronizations that the deploy API can manage. But if all the systems are to be implemented...well, then it probably means a pluggable Deploy API that allows for different subsystems to handle different segments of the overall process.
Sam
You make a lot of good points. I guess from here on in, this discussion should move to the prototype arena, and working together with Greg, Alex, and others to see what progress can be made. Victor Kane On Mon, Aug 11, 2008 at 6:09 PM, Sam Boyer <drupal@samboyer.org> wrote:
On Monday 11 August 2008 14:38:27 Victor Kane wrote:
Your points are interestng, but I think there may be a lot to what Greg Dunlap is recommending in terms of thinking in Drupal logic terms and not database terms.
The image I have in my mind is that the database is kind of a two-dimensional projection of three dimensions; that is, there may be many hidden relationships in the process side of things that have to be taken into account for deployment, especially since the database is usually considered in purely MySql terms, that is, with fully transparent relationships between tables.
But given concrete client driven circumstances, of course, in a given instance with a given set of priorities, I can easily see how what you are saying could make sense.
Victor
At this point then, I suspect we're talking past each other. First, are you referring only to the direct-database sync point that I made, or is that also in reference to the drupal-object exporting, git-managed system? If it's the latter as well as the former, then we've _really_ diverged.
Just because I'm suggesting that there are direct-database syncing solutions possible does NOT mean that I'm not thinking in Drupal logic terms. Let me try a different, potentially clearer way:
In your approach, it seems to me that the goal is to use existing drupal API functions - let's use nodes, so node_load() - to fully build a node object. That object is then exported to code, at which point we can do things with vcs, etc. During a deployment, we parse in that code and utilize the API counterpart to our data-getter function, so node_save(), to push the information into the database. Before node_save() is actually run, however, we can use something like a GUID system to change primary keys as necessary so that we've got the node from server A going into the right associated node on server B.
As I said, I can see arguments for that, particularly if it utilizes a git backend. And only differs from what I'm proposing in a few ways.
What I'm suggesting stems from Greg's original point about a GUID for every 'thing'. 'Things' being, to use your words:
On Monday 11 August 2008 05:33:49 Victor Kane wrote:
... all entities, configurations, including exported views, content types, panels, hopefully menues and module configurations and exported variables ...
Because I agree completely that the _only_ way to arrive at a solution that works for drupal is to think in terms of the 'things' (I'm going to say 'items' from here on out) it makes - not to try to just grab bits of data from here and there and hope it all lines up on the other end. I've always thought that, and am pretty sure I always will.
My proposal is for a deploy API that would let modules define what these items are, and then define a set of behaviors for managing the deployment of those items across a variety of different circumstances. For our concrete example, I suspect the _first_ thing I'd do in the synchro handler for nodes is to call node_load(), then follow it up with some internal logic that...I dunno, there are a lot of ways we could go from there. It could dynamically construct a delta from the last sync time with the remote server; it could just fire over the whole node object. If extension modules need to do something that the node module's deploy handler couldn't work out, no problem - those modules just need to register their interest in deploy transactions related to the GUID for that particular item. On the receiving end, the node module's deploy API implementation knows what to expect coming through the pipe and handles it accordingly - maybe through node_save(), maybe not.
The advantage here, as I see it, is the potential to drill down _very_ quickly on exactly what should be checked during a given changeset. Very quickly as in, potentially, a single query. I can't picture the schema for the deploy items table, so it may take more, but it could be as simple as a single SELECT query that grabs all the items which have been changed/created since the last deploy txn, and that that particular deploy txn is interested in (again, a dev <=> qa txn != staging <=> live txn), and then it's a simple question of iterating through modules that have something to say about how each of the items is deployed.
As I said, I can see ways that a git-driven system can probably provide similar speed when it comes to drilling down to what items need to be considered in a given txn; also, a git-driven system has the added benefit of being able to, even when your local system offline, still provide the entire version history on demand for _each server_ you've ever connected with on that project. Well, assuming your remote git branches are up to date.
As far as I can tell, this is really the kind of thing you're talking about when you say:
...the database is kind of a two-dimensional projection of three dimensions that is, there may be many hidden relationships in the process side of things that have to be taken into account for deployment...
I can think of two very different ways of interpreting that metaphor, both of which are applicable to the topic at hand. I'm hoping, though, that this explanation finally does make clear that I'm _not_ thinking along the lines of 'how do we make an sqldump better?', but instead about methods for making deployment a process that's as smart about drupal data as drupal itself is.
s
On Mon, Aug 11, 2008 at 3:51 PM, Sam Boyer <drupal@samboyer.org> wrote:
On Monday 11 August 2008 05:33:49 Victor Kane wrote:
The serialization and unserialization of data is included in my approach to the problem for the purposes of the independent transmission of nodes from one system to another, as in the case of one Drupal site availing itself of a node.save service on another Drupal site.
It also has the purpose of guaranteeing insofar as is possible a text version of all entities, configurations, including exported views, content types, panels, hopefully menues and module configurations and exported variables, for the purposes of continued version control and hence deployment also (serialization to text, unserialization to deployment objective).
Here of course, serialization and unserialization is not meant in the php function sense, and could include marshaling and unmarshaling to and from XML, and is a cross-language concept.
Victor Kane http://awebfactory.com.ar
So my initial reaction was that this was actually a disadvantage - it seemed to introduce an extra layer of unnecessary complexity, as it requires pulling the data out of the db, coming up with a new storage format, then transferring that format and reintegrating it into another db. The backup and project-level revisioning control implications are interesting - but that's a wholly different axis from the crux of the deployment paradigm, where there's _one_ version.
However, on further reflection, I can see there being some strong arguments in either direction. Your approach, Victor, makes me drift back to the recent vcs thread on this list, as I can't imagine such a system being feasible and really scalable without the use of something like (gasp!) git. Two basic reasons for that:
1. Data integrity assurance: there's nothing like a SHA1 hash to ensure that nothing gets corrupted in all the combining/recombining of data through and around various servers. And then there's the whole content-addressable filesystem bit - I'd conjecture that it would make git exceptionally proficient at handling either database dumps or tons of data structured from 'export' functionality, whichever the case may be. I imagine Kathleen might be able to speak more to that.
2. Security and logic (and speed): if run on a git backend, I'd speculate that we could use project-specific ssh keys to encrypt all the synchronizations (although that obviously brings up a host of other potential requirements/issues). On the logic end, we could build on top of git's system for organizing sets of commits to allow for different 'types' of syncs (i.e., you're working with different datasets when doing dev <=> qa vs. when you're working with live <=> staging). As for speed...well, I'll just call that a hunch =)
This approach would require a _lot_ of coding, though. The more immediate-term solution that still makes the most sense to me is one where we let modules work straight from the data as it exists in the db, and define some logic for handling synchronizations that the deploy API can manage. But if all the systems are to be implemented...well, then it probably means a pluggable Deploy API that allows for different subsystems to handle different segments of the overall process.
Sam
You make a lot of good points. I guess from here on in, this discussion should move to the prototype arena, and working together with Greg, Alex, and others to see what progress can be made. Victor Kane On Mon, Aug 11, 2008 at 6:09 PM, Sam Boyer <drupal@samboyer.org> wrote:
On Monday 11 August 2008 14:38:27 Victor Kane wrote:
Your points are interestng, but I think there may be a lot to what Greg Dunlap is recommending in terms of thinking in Drupal logic terms and not database terms.
The image I have in my mind is that the database is kind of a two-dimensional projection of three dimensions; that is, there may be many hidden relationships in the process side of things that have to be taken into account for deployment, especially since the database is usually considered in purely MySql terms, that is, with fully transparent relationships between tables.
But given concrete client driven circumstances, of course, in a given instance with a given set of priorities, I can easily see how what you are saying could make sense.
Victor
At this point then, I suspect we're talking past each other. First, are you referring only to the direct-database sync point that I made, or is that also in reference to the drupal-object exporting, git-managed system? If it's the latter as well as the former, then we've _really_ diverged.
Just because I'm suggesting that there are direct-database syncing solutions possible does NOT mean that I'm not thinking in Drupal logic terms. Let me try a different, potentially clearer way:
In your approach, it seems to me that the goal is to use existing drupal API functions - let's use nodes, so node_load() - to fully build a node object. That object is then exported to code, at which point we can do things with vcs, etc. During a deployment, we parse in that code and utilize the API counterpart to our data-getter function, so node_save(), to push the information into the database. Before node_save() is actually run, however, we can use something like a GUID system to change primary keys as necessary so that we've got the node from server A going into the right associated node on server B.
As I said, I can see arguments for that, particularly if it utilizes a git backend. And only differs from what I'm proposing in a few ways.
What I'm suggesting stems from Greg's original point about a GUID for every 'thing'. 'Things' being, to use your words:
On Monday 11 August 2008 05:33:49 Victor Kane wrote:
... all entities, configurations, including exported views, content types, panels, hopefully menues and module configurations and exported variables ...
Because I agree completely that the _only_ way to arrive at a solution that works for drupal is to think in terms of the 'things' (I'm going to say 'items' from here on out) it makes - not to try to just grab bits of data from here and there and hope it all lines up on the other end. I've always thought that, and am pretty sure I always will.
My proposal is for a deploy API that would let modules define what these items are, and then define a set of behaviors for managing the deployment of those items across a variety of different circumstances. For our concrete example, I suspect the _first_ thing I'd do in the synchro handler for nodes is to call node_load(), then follow it up with some internal logic that...I dunno, there are a lot of ways we could go from there. It could dynamically construct a delta from the last sync time with the remote server; it could just fire over the whole node object. If extension modules need to do something that the node module's deploy handler couldn't work out, no problem - those modules just need to register their interest in deploy transactions related to the GUID for that particular item. On the receiving end, the node module's deploy API implementation knows what to expect coming through the pipe and handles it accordingly - maybe through node_save(), maybe not.
The advantage here, as I see it, is the potential to drill down _very_ quickly on exactly what should be checked during a given changeset. Very quickly as in, potentially, a single query. I can't picture the schema for the deploy items table, so it may take more, but it could be as simple as a single SELECT query that grabs all the items which have been changed/created since the last deploy txn, and that that particular deploy txn is interested in (again, a dev <=> qa txn != staging <=> live txn), and then it's a simple question of iterating through modules that have something to say about how each of the items is deployed.
As I said, I can see ways that a git-driven system can probably provide similar speed when it comes to drilling down to what items need to be considered in a given txn; also, a git-driven system has the added benefit of being able to, even when your local system offline, still provide the entire version history on demand for _each server_ you've ever connected with on that project. Well, assuming your remote git branches are up to date.
As far as I can tell, this is really the kind of thing you're talking about when you say:
...the database is kind of a two-dimensional projection of three dimensions that is, there may be many hidden relationships in the process side of things that have to be taken into account for deployment...
I can think of two very different ways of interpreting that metaphor, both of which are applicable to the topic at hand. I'm hoping, though, that this explanation finally does make clear that I'm _not_ thinking along the lines of 'how do we make an sqldump better?', but instead about methods for making deployment a process that's as smart about drupal data as drupal itself is.
s
On Mon, Aug 11, 2008 at 3:51 PM, Sam Boyer <drupal@samboyer.org> wrote:
On Monday 11 August 2008 05:33:49 Victor Kane wrote:
The serialization and unserialization of data is included in my approach to the problem for the purposes of the independent transmission of nodes from one system to another, as in the case of one Drupal site availing itself of a node.save service on another Drupal site.
It also has the purpose of guaranteeing insofar as is possible a text version of all entities, configurations, including exported views, content types, panels, hopefully menues and module configurations and exported variables, for the purposes of continued version control and hence deployment also (serialization to text, unserialization to deployment objective).
Here of course, serialization and unserialization is not meant in the php function sense, and could include marshaling and unmarshaling to and from XML, and is a cross-language concept.
Victor Kane http://awebfactory.com.ar
So my initial reaction was that this was actually a disadvantage - it seemed to introduce an extra layer of unnecessary complexity, as it requires pulling the data out of the db, coming up with a new storage format, then transferring that format and reintegrating it into another db. The backup and project-level revisioning control implications are interesting - but that's a wholly different axis from the crux of the deployment paradigm, where there's _one_ version.
However, on further reflection, I can see there being some strong arguments in either direction. Your approach, Victor, makes me drift back to the recent vcs thread on this list, as I can't imagine such a system being feasible and really scalable without the use of something like (gasp!) git. Two basic reasons for that:
1. Data integrity assurance: there's nothing like a SHA1 hash to ensure that nothing gets corrupted in all the combining/recombining of data through and around various servers. And then there's the whole content-addressable filesystem bit - I'd conjecture that it would make git exceptionally proficient at handling either database dumps or tons of data structured from 'export' functionality, whichever the case may be. I imagine Kathleen might be able to speak more to that.
2. Security and logic (and speed): if run on a git backend, I'd speculate that we could use project-specific ssh keys to encrypt all the synchronizations (although that obviously brings up a host of other potential requirements/issues). On the logic end, we could build on top of git's system for organizing sets of commits to allow for different 'types' of syncs (i.e., you're working with different datasets when doing dev <=> qa vs. when you're working with live <=> staging). As for speed...well, I'll just call that a hunch =)
This approach would require a _lot_ of coding, though. The more immediate-term solution that still makes the most sense to me is one where we let modules work straight from the data as it exists in the db, and define some logic for handling synchronizations that the deploy API can manage. But if all the systems are to be implemented...well, then it probably means a pluggable Deploy API that allows for different subsystems to handle different segments of the overall process.
Sam
You make a lot of good points. I guess from here on in, this discussion should move to the prototype arena, and working together with Greg, Alex, and others to see what progress can be made. On Mon, Aug 11, 2008 at 6:09 PM, Sam Boyer <drupal@samboyer.org> wrote:
On Monday 11 August 2008 14:38:27 Victor Kane wrote:
Your points are interestng, but I think there may be a lot to what Greg Dunlap is recommending in terms of thinking in Drupal logic terms and not database terms.
The image I have in my mind is that the database is kind of a two-dimensional projection of three dimensions; that is, there may be many hidden relationships in the process side of things that have to be taken into account for deployment, especially since the database is usually considered in purely MySql terms, that is, with fully transparent relationships between tables.
But given concrete client driven circumstances, of course, in a given instance with a given set of priorities, I can easily see how what you are saying could make sense.
Victor
At this point then, I suspect we're talking past each other. First, are you referring only to the direct-database sync point that I made, or is that also in reference to the drupal-object exporting, git-managed system? If it's the latter as well as the former, then we've _really_ diverged.
Just because I'm suggesting that there are direct-database syncing solutions possible does NOT mean that I'm not thinking in Drupal logic terms. Let me try a different, potentially clearer way:
In your approach, it seems to me that the goal is to use existing drupal API functions - let's use nodes, so node_load() - to fully build a node object. That object is then exported to code, at which point we can do things with vcs, etc. During a deployment, we parse in that code and utilize the API counterpart to our data-getter function, so node_save(), to push the information into the database. Before node_save() is actually run, however, we can use something like a GUID system to change primary keys as necessary so that we've got the node from server A going into the right associated node on server B.
As I said, I can see arguments for that, particularly if it utilizes a git backend. And only differs from what I'm proposing in a few ways.
What I'm suggesting stems from Greg's original point about a GUID for every 'thing'. 'Things' being, to use your words:
On Monday 11 August 2008 05:33:49 Victor Kane wrote:
... all entities, configurations, including exported views, content types, panels, hopefully menues and module configurations and exported variables ...
Because I agree completely that the _only_ way to arrive at a solution that works for drupal is to think in terms of the 'things' (I'm going to say 'items' from here on out) it makes - not to try to just grab bits of data from here and there and hope it all lines up on the other end. I've always thought that, and am pretty sure I always will.
My proposal is for a deploy API that would let modules define what these items are, and then define a set of behaviors for managing the deployment of those items across a variety of different circumstances. For our concrete example, I suspect the _first_ thing I'd do in the synchro handler for nodes is to call node_load(), then follow it up with some internal logic that...I dunno, there are a lot of ways we could go from there. It could dynamically construct a delta from the last sync time with the remote server; it could just fire over the whole node object. If extension modules need to do something that the node module's deploy handler couldn't work out, no problem - those modules just need to register their interest in deploy transactions related to the GUID for that particular item. On the receiving end, the node module's deploy API implementation knows what to expect coming through the pipe and handles it accordingly - maybe through node_save(), maybe not.
The advantage here, as I see it, is the potential to drill down _very_ quickly on exactly what should be checked during a given changeset. Very quickly as in, potentially, a single query. I can't picture the schema for the deploy items table, so it may take more, but it could be as simple as a single SELECT query that grabs all the items which have been changed/created since the last deploy txn, and that that particular deploy txn is interested in (again, a dev <=> qa txn != staging <=> live txn), and then it's a simple question of iterating through modules that have something to say about how each of the items is deployed.
As I said, I can see ways that a git-driven system can probably provide similar speed when it comes to drilling down to what items need to be considered in a given txn; also, a git-driven system has the added benefit of being able to, even when your local system offline, still provide the entire version history on demand for _each server_ you've ever connected with on that project. Well, assuming your remote git branches are up to date.
As far as I can tell, this is really the kind of thing you're talking about when you say:
...the database is kind of a two-dimensional projection of three dimensions that is, there may be many hidden relationships in the process side of things that have to be taken into account for deployment...
I can think of two very different ways of interpreting that metaphor, both of which are applicable to the topic at hand. I'm hoping, though, that this explanation finally does make clear that I'm _not_ thinking along the lines of 'how do we make an sqldump better?', but instead about methods for making deployment a process that's as smart about drupal data as drupal itself is.
s
On Mon, Aug 11, 2008 at 3:51 PM, Sam Boyer <drupal@samboyer.org> wrote:
On Monday 11 August 2008 05:33:49 Victor Kane wrote:
The serialization and unserialization of data is included in my approach to the problem for the purposes of the independent transmission of nodes from one system to another, as in the case of one Drupal site availing itself of a node.save service on another Drupal site.
It also has the purpose of guaranteeing insofar as is possible a text version of all entities, configurations, including exported views, content types, panels, hopefully menues and module configurations and exported variables, for the purposes of continued version control and hence deployment also (serialization to text, unserialization to deployment objective).
Here of course, serialization and unserialization is not meant in the php function sense, and could include marshaling and unmarshaling to and from XML, and is a cross-language concept.
Victor Kane http://awebfactory.com.ar
So my initial reaction was that this was actually a disadvantage - it seemed to introduce an extra layer of unnecessary complexity, as it requires pulling the data out of the db, coming up with a new storage format, then transferring that format and reintegrating it into another db. The backup and project-level revisioning control implications are interesting - but that's a wholly different axis from the crux of the deployment paradigm, where there's _one_ version.
However, on further reflection, I can see there being some strong arguments in either direction. Your approach, Victor, makes me drift back to the recent vcs thread on this list, as I can't imagine such a system being feasible and really scalable without the use of something like (gasp!) git. Two basic reasons for that:
1. Data integrity assurance: there's nothing like a SHA1 hash to ensure that nothing gets corrupted in all the combining/recombining of data through and around various servers. And then there's the whole content-addressable filesystem bit - I'd conjecture that it would make git exceptionally proficient at handling either database dumps or tons of data structured from 'export' functionality, whichever the case may be. I imagine Kathleen might be able to speak more to that.
2. Security and logic (and speed): if run on a git backend, I'd speculate that we could use project-specific ssh keys to encrypt all the synchronizations (although that obviously brings up a host of other potential requirements/issues). On the logic end, we could build on top of git's system for organizing sets of commits to allow for different 'types' of syncs (i.e., you're working with different datasets when doing dev <=> qa vs. when you're working with live <=> staging). As for speed...well, I'll just call that a hunch =)
This approach would require a _lot_ of coding, though. The more immediate-term solution that still makes the most sense to me is one where we let modules work straight from the data as it exists in the db, and define some logic for handling synchronizations that the deploy API can manage. But if all the systems are to be implemented...well, then it probably means a pluggable Deploy API that allows for different subsystems to handle different segments of the overall process.
Sam
You make a lot of good points. I guess from here on in, this discussion should move to the prototype arena, and working together with Greg, Alex, and others to see what progress can be made. On Mon, Aug 11, 2008 at 6:09 PM, Sam Boyer <drupal@samboyer.org> wrote:
On Monday 11 August 2008 14:38:27 Victor Kane wrote:
Your points are interestng, but I think there may be a lot to what Greg Dunlap is recommending in terms of thinking in Drupal logic terms and not database terms.
The image I have in my mind is that the database is kind of a two-dimensional projection of three dimensions; that is, there may be many hidden relationships in the process side of things that have to be taken into account for deployment, especially since the database is usually considered in purely MySql terms, that is, with fully transparent relationships between tables.
But given concrete client driven circumstances, of course, in a given instance with a given set of priorities, I can easily see how what you are saying could make sense.
Victor
At this point then, I suspect we're talking past each other. First, are you referring only to the direct-database sync point that I made, or is that also in reference to the drupal-object exporting, git-managed system? If it's the latter as well as the former, then we've _really_ diverged.
Just because I'm suggesting that there are direct-database syncing solutions possible does NOT mean that I'm not thinking in Drupal logic terms. Let me try a different, potentially clearer way:
In your approach, it seems to me that the goal is to use existing drupal API functions - let's use nodes, so node_load() - to fully build a node object. That object is then exported to code, at which point we can do things with vcs, etc. During a deployment, we parse in that code and utilize the API counterpart to our data-getter function, so node_save(), to push the information into the database. Before node_save() is actually run, however, we can use something like a GUID system to change primary keys as necessary so that we've got the node from server A going into the right associated node on server B.
As I said, I can see arguments for that, particularly if it utilizes a git backend. And only differs from what I'm proposing in a few ways.
What I'm suggesting stems from Greg's original point about a GUID for every 'thing'. 'Things' being, to use your words:
On Monday 11 August 2008 05:33:49 Victor Kane wrote:
... all entities, configurations, including exported views, content types, panels, hopefully menues and module configurations and exported variables ...
Because I agree completely that the _only_ way to arrive at a solution that works for drupal is to think in terms of the 'things' (I'm going to say 'items' from here on out) it makes - not to try to just grab bits of data from here and there and hope it all lines up on the other end. I've always thought that, and am pretty sure I always will.
My proposal is for a deploy API that would let modules define what these items are, and then define a set of behaviors for managing the deployment of those items across a variety of different circumstances. For our concrete example, I suspect the _first_ thing I'd do in the synchro handler for nodes is to call node_load(), then follow it up with some internal logic that...I dunno, there are a lot of ways we could go from there. It could dynamically construct a delta from the last sync time with the remote server; it could just fire over the whole node object. If extension modules need to do something that the node module's deploy handler couldn't work out, no problem - those modules just need to register their interest in deploy transactions related to the GUID for that particular item. On the receiving end, the node module's deploy API implementation knows what to expect coming through the pipe and handles it accordingly - maybe through node_save(), maybe not.
The advantage here, as I see it, is the potential to drill down _very_ quickly on exactly what should be checked during a given changeset. Very quickly as in, potentially, a single query. I can't picture the schema for the deploy items table, so it may take more, but it could be as simple as a single SELECT query that grabs all the items which have been changed/created since the last deploy txn, and that that particular deploy txn is interested in (again, a dev <=> qa txn != staging <=> live txn), and then it's a simple question of iterating through modules that have something to say about how each of the items is deployed.
As I said, I can see ways that a git-driven system can probably provide similar speed when it comes to drilling down to what items need to be considered in a given txn; also, a git-driven system has the added benefit of being able to, even when your local system offline, still provide the entire version history on demand for _each server_ you've ever connected with on that project. Well, assuming your remote git branches are up to date.
As far as I can tell, this is really the kind of thing you're talking about when you say:
...the database is kind of a two-dimensional projection of three dimensions that is, there may be many hidden relationships in the process side of things that have to be taken into account for deployment...
I can think of two very different ways of interpreting that metaphor, both of which are applicable to the topic at hand. I'm hoping, though, that this explanation finally does make clear that I'm _not_ thinking along the lines of 'how do we make an sqldump better?', but instead about methods for making deployment a process that's as smart about drupal data as drupal itself is.
s
On Mon, Aug 11, 2008 at 3:51 PM, Sam Boyer <drupal@samboyer.org> wrote:
On Monday 11 August 2008 05:33:49 Victor Kane wrote:
The serialization and unserialization of data is included in my approach to the problem for the purposes of the independent transmission of nodes from one system to another, as in the case of one Drupal site availing itself of a node.save service on another Drupal site.
It also has the purpose of guaranteeing insofar as is possible a text version of all entities, configurations, including exported views, content types, panels, hopefully menues and module configurations and exported variables, for the purposes of continued version control and hence deployment also (serialization to text, unserialization to deployment objective).
Here of course, serialization and unserialization is not meant in the php function sense, and could include marshaling and unmarshaling to and from XML, and is a cross-language concept.
Victor Kane http://awebfactory.com.ar
So my initial reaction was that this was actually a disadvantage - it seemed to introduce an extra layer of unnecessary complexity, as it requires pulling the data out of the db, coming up with a new storage format, then transferring that format and reintegrating it into another db. The backup and project-level revisioning control implications are interesting - but that's a wholly different axis from the crux of the deployment paradigm, where there's _one_ version.
However, on further reflection, I can see there being some strong arguments in either direction. Your approach, Victor, makes me drift back to the recent vcs thread on this list, as I can't imagine such a system being feasible and really scalable without the use of something like (gasp!) git. Two basic reasons for that:
1. Data integrity assurance: there's nothing like a SHA1 hash to ensure that nothing gets corrupted in all the combining/recombining of data through and around various servers. And then there's the whole content-addressable filesystem bit - I'd conjecture that it would make git exceptionally proficient at handling either database dumps or tons of data structured from 'export' functionality, whichever the case may be. I imagine Kathleen might be able to speak more to that.
2. Security and logic (and speed): if run on a git backend, I'd speculate that we could use project-specific ssh keys to encrypt all the synchronizations (although that obviously brings up a host of other potential requirements/issues). On the logic end, we could build on top of git's system for organizing sets of commits to allow for different 'types' of syncs (i.e., you're working with different datasets when doing dev <=> qa vs. when you're working with live <=> staging). As for speed...well, I'll just call that a hunch =)
This approach would require a _lot_ of coding, though. The more immediate-term solution that still makes the most sense to me is one where we let modules work straight from the data as it exists in the db, and define some logic for handling synchronizations that the deploy API can manage. But if all the systems are to be implemented...well, then it probably means a pluggable Deploy API that allows for different subsystems to handle different segments of the overall process.
Sam
On Monday 11 August 2008 2:38:27 pm Victor Kane wrote:
Your points are interestng, but I think there may be a lot to what Greg Dunlap is recommending in terms of thinking in Drupal logic terms and not database terms.
One of the take-aways from the Data API Design Sprint back in February was that we need to move away from assuming that everything is in a single local SQL database. That is simply far too limiting. Any import/export/deploy mechanism will absolutely need to operate at the entity level, not the database level, because there's no guarantee that there will always be a local SQL database in which these things are stored. -- Larry Garfield larry@garfieldtech.com
Excellent! On Mon, Aug 11, 2008 at 10:15 PM, Larry Garfield <larry@garfieldtech.com> wrote:
On Monday 11 August 2008 2:38:27 pm Victor Kane wrote:
Your points are interestng, but I think there may be a lot to what Greg Dunlap is recommending in terms of thinking in Drupal logic terms and not database terms.
One of the take-aways from the Data API Design Sprint back in February was that we need to move away from assuming that everything is in a single local SQL database. That is simply far too limiting.
Any import/export/deploy mechanism will absolutely need to operate at the entity level, not the database level, because there's no guarantee that there will always be a local SQL database in which these things are stored.
-- Larry Garfield larry@garfieldtech.com
On Sunday 10 August 2008, Greg Dunlap wrote:
This whole problem would be much simpler if every "thing" had a GUID. Get that in place, and building a system to do deployment in the way described above actually becomes a realistic proposition.
I hope I am not stating the obvious here, but you can get yourself a pseudo-GUID if you ask the user for a string that identifies the installation (e.g. my-dev, my-staging, my-production) and use that and the object internal id to create a GUID (no hashing necessary). When you import objects INTO an installation you keep a table that maps the remote object GUID to the local object GUID. Then you have a well defined one to one mapping, and you are able to update the object again, reference to it (this comment is for that node), or even send it back if needed. You can also keep track of which object came from where if you ever need this kind of auditing. I don't trust arbitrary strings like domain, IP, MAC etc as they have the tendency to change. Asking the user for a unique id (and it only has to be unique in his world) should be good enough here. -- Yuval Hager [@] yuval@avramzon.net
On Sunday 10 August 2008, Greg Dunlap wrote:
When I created Deploy I had a set of goals in mind that I felt a solution had to meet. I think (hope) that most people will agree that these goals represent the ideal of what any staging and deployment system should achieve. Some of this will repeat some of what has already been said insightfully and thoughtfully above, but I kind of wanted it all put together in one place.
[snip]
I've been thinking a lot about this thread. It has developed to be a very informative and important discussion. I was not aware of the deploy module, and it definitely looks like a very big step in the right direction. Taking this a step further I could *not* imagine myself deploy-ing data onto a client's live production site, although I happily use 'svn export' on such a site for code. Making all sorts of changes to the DB in a bulk is very frightening to do on production, and something I will try to avoid as much as I can. Assuming I am not the only one with this thoughts, I tried to analyze the basis of that hesitation. SVN does version management. I trust the tool that if I had made a catastrophic change on production, I can always switch back to the old version and leave production in a stable state. I can also provide a log of changes and create a diff to know what has changed. In order to be able to use a deploy style solution on a production server, IMHO, it must provide a trackable, version control style of logging and rollbacks. Somebody in this thread mentioned another VCS that might be used - maybe this is what he meant.. I am not sure. Have you considered this type of version control handling for deploy? Or maybe this is achieved through the serialization that was discussed elsewhere in this thread? --yuval
On Aug 12, 2008, at 5:27 AM, Yuval Hager wrote:
On Sunday 10 August 2008, Greg Dunlap wrote:
When I created Deploy I had a set of goals in mind that I felt a solution had to meet. I think (hope) that most people will agree that these goals represent the ideal of what any staging and deployment system should achieve. Some of this will repeat some of what has already been said insightfully and thoughtfully above, but I kind of wanted it all put together in one place.
[snip] [...]
Have you considered this type of version control handling for deploy? Or maybe this is achieved through the serialization that was discussed elsewhere in this thread?
I can't speak of deploy, but this is defintely one of the three major use cases I'm having in my mind when thinking of exporting/importing site structure: - deployment - install profiles (or module package configurations) - version control One additional thing I'm having in the back of my head is doing the same thing with content: Once we've figured out what's structure in our sites, the rest is content, right? Alex Barth http://www.developmentseed.org/blog tel (202) 250-3633
On Tue, Aug 12, 2008 at 8:09 AM, Alex Barth <alex@developmentseed.org> wrote:
One additional thing I'm having in the back of my head is doing the same thing with content: Once we've figured out what's structure in our sites, the rest is content, right?
I have classified four types of data: * Configuration (structure) * Content * User * Ignorable Configuration are things like the variables, blocks, cck type and field definitions, etc (you can configure settings.php to have unique variable settings on dev and prod). Content is nodes, taxonomy, menu, path, etc. User data is users, comments, search, statistics, watchdog, etc. Basically, anything that tracks what a user is doing (this is also stuff you generally don't need to track in development). Ignorable data is cache and sessions - neither of which are drastically harmful if you lose. You don't store the data, but you also don't erase it. It lives only in the live database, and if the live database is lost, it's not so bad to have to regenerate that data. The other possible "ignorable" data is the search index - this is sort of a personal preference, I think. By default, I have it categorized as user data, but I think it could be ignored if needed for performance reasons. In a data classification plan, I think it would be important to leave it flexible for a user to choose their own classification. The search index is one example of personal preference. --- Kathleen Murtagh
In your model taxonomy is "content" while variables are "configuration". Let's say on my site, one of my taxonomies is "privacy" with terms like "public", "administrator only", etc... Further I configure one of the taxonomy access control modules to use my privacy terms. In my mind those terms are part of "configuration" (while other terms might be considered "content"). While in your model as I understand it, I now have some variables (configuration, of the access control module) referring to my term ids (content). So the line is crossed, I can't replicate the configuration unless I also replicate the content. If I understand you correctly. I say this trying to point out that we can't simply divide Drupal's database into "content" tables and "configuration" tables. Unfortunately is will never be that simple. There are many examples like this where the line between content and configuration is fuzzy. While I'm pessimistic about that particular idea, I'm optimistic about the UUID idea being discussed in the "Unique/Random IDs and drupal" thread. I think using UUIDs rather than sequential ids will solve a big huge chunk of the dev->staging->live problem. -Dave On Tuesday 12 August 2008, Kathleen Murtagh wrote:
Configuration are things like the variables, blocks, cck type and field definitions, etc (you can configure settings.php to have unique variable settings on dev and prod). Content is nodes, taxonomy, menu, path, etc.
On Tue, Aug 12, 2008 at 1:33 PM, Dave Cohen <drupal@dave-cohen.com> wrote:
<snip> While in your model as I understand it, I now have some variables (configuration, of the access control module) referring to my term ids (content). So the line is crossed, I can't replicate the configuration unless I also replicate the content. If I understand you correctly.
I say this trying to point out that we can't simply divide Drupal's database into "content" tables and "configuration" tables. Unfortunately is will never be that simple. There are many examples like this where the line between content and configuration is fuzzy.
Dave, you are correct. It is for this reason that I took the "merge" approach. I have been thinking very deeply about this topic for about a year now, so I may not so quickly 'get' everything that is being discussed as I'm likely in a rut in my thinking. However, I feel then my role may be best to at the very least provide what I have learned for those who do 'get' the whole issue. The way Drupal behaves, creating this fuzzy division between configuration and content, makes this whole process difficult. This is the core of the problem, otherwise we could just copy tables back and forth. As I see it, UUID's would really only solve one problem of the whole package - the sequences issue. It's, of course, a great problem to solve, however, there are other issues. One of the things that likely needs to be supported in any system is modifying content data in both development and production. I think it is a long ways off before this fuzzy division will go away. In all of the sites I have developed, I have in some way touched and modified nodes during development of a site that has a live counterpart. Your taxonomy example is great, as it can be both used for configuration (access control) and content (free tagging) on one website. Yikes. If you allow modification of content data on both development and production, then it opens a whole can of worms when merging that data. The biggest issue is figuring out what is "new" content and what is "deleted" content. If I delete something in development, is that deletion eliminated because it is considered "new" content from production? If I delete something in production, how do I push that back to development without development inadvertently pushing it back to production? Then, ick, what if the same piece of content is modified on both production and development (lets say the menus as the most likely culprit)? (please excuse me if this was a double post) --- Kathleen Murtagh
On Aug 12, 2008, at 1:33 PM, Dave Cohen wrote:
In your model taxonomy is "content" while variables are "configuration". Let's say on my site, one of my taxonomies is "privacy" with terms like "public", "administrator only", etc... Further I configure one of the taxonomy access control modules to use my privacy terms.
In my mind those terms are part of "configuration" (while other terms might be considered "content").
While really convenient, I'm starting to consider using taxonomy as configuration as bad practice. In the long run, building effective import/export and migration tools will also mean having a clean distinction between what's content and what's configuration on the Drupal level...
While in your model as I understand it, I now have some variables (configuration, of the access control module) referring to my term ids (content). So the line is crossed, I can't replicate the configuration unless I also replicate the content. If I understand you correctly.
I say this trying to point out that we can't simply divide Drupal's database into "content" tables and "configuration" tables. Unfortunately is will never be that simple. There are many examples like this where the line between content and configuration is fuzzy.
While I'm pessimistic about that particular idea, I'm optimistic about the UUID idea being discussed in the "Unique/Random IDs and drupal" thread. I think using UUIDs rather than sequential ids will solve a big huge chunk of the dev->staging->live problem.
-Dave
On Tuesday 12 August 2008, Kathleen Murtagh wrote:
Configuration are things like the variables, blocks, cck type and field definitions, etc (you can configure settings.php to have unique variable settings on dev and prod). Content is nodes, taxonomy, menu, path, etc.
Alex Barth http://www.developmentseed.org/blog tel (202) 250-3633
Quoting Alex Barth <alex@developmentseed.org>:
While really convenient, I'm starting to consider using taxonomy as configuration as bad practice.
In the long run, building effective import/export and migration tools will also mean having a clean distinction between what's content and what's configuration on the Drupal level...
But you just need to know which vocabularies are configuration; so you design the tool to take it into account. Earnie -- http://for-my-kids.com/ -- http://give-me-an-offer.com/
On Aug 14, 2008, at 11:23 AM, Earnie Boyd wrote:
Quoting Alex Barth <alex@developmentseed.org>:
While really convenient, I'm starting to consider using taxonomy as configuration as bad practice.
In the long run, building effective import/export and migration tools will also mean having a clean distinction between what's content and what's configuration on the Drupal level...
But you just need to know which vocabularies are configuration; so you design the tool to take it into account.
sure - we have to face reality, right? :)
Earnie -- http://for-my-kids.com/ -- http://give-me-an-offer.com/
Alex Barth http://www.developmentseed.org/blog tel (202) 250-3633
On 10 Aug 2008, at 1:22 AM, Sam Boyer wrote:
Given that heyrocker/Greg is now at Palantir, deploy is on our minds...let me try to not steal any of his thunder :)
One of the things I would like to see in deploy , or any system we end up using, is the ability to generate a module with a .install file for the configuration info in install and update_x hooks. This allows developers to version their code and configuration with their version tracking system, and roll out new releases the same way they would normally.
On Saturday 09 August 2008 20:09:59 Adrian Rossouw wrote:
On 10 Aug 2008, at 1:22 AM, Sam Boyer wrote:
Given that heyrocker/Greg is now at Palantir, deploy is on our minds...let me try to not steal any of his thunder :)
One of the things I would like to see in deploy , or any system we end up using, is the ability to generate a module with a .install file for the configuration info in install and update_x hooks.
This allows developers to version their code and configuration with their version tracking system, and roll out new releases the same way they would normally.
Sorry, I'm not grokking this idea. What's the use case for generating modules and associated update hooks?
On 10 Aug 2008, at 4:41 PM, Sam Boyer wrote:
Sorry, I'm not grokking this idea. What's the use case for generating modules and associated update hooks? checking them into your own revisioning system with your own release and qa structures, distributing configurations and more.
Basically keeping your releases in line with each other, so you don't have the code, and then a floating database synch process which doesn't match up directly to your code, so you wouldn't have sites trying to communicate with each other using different versions of modules (which might cause incompatibilities). This way, you upgrade all the sites and the code the way you normally would : svn update; run update.php. Being able to take any site and going 'You are an install profile now', AND keeping that install profile in synch across later versions without having to have to subscribe all the sites using the install profile to the main synch. (many of those sites might not even be yours) Basically, the dump to module thing takes the long view. Once you have done your development, and you actually want to deploy it in one or more places, you have a single atomic package of everything you need to get the site up and running, to which people can add their own content.
On Sunday 10 August 2008 11:09:14 Adrian Rossouw wrote:
On 10 Aug 2008, at 4:41 PM, Sam Boyer wrote:
Sorry, I'm not grokking this idea. What's the use case for generating modules and associated update hooks?
checking them into your own revisioning system with your own release and qa structures, distributing configurations and more.
Basically keeping your releases in line with each other, so you don't have the code, and then a floating database synch process which doesn't match up directly to your code, so you wouldn't have sites trying to communicate with each other using different versions of modules (which might cause incompatibilities).
This way, you upgrade all the sites and the code the way you normally would : svn update; run update.php.
Being able to take any site and going 'You are an install profile now', AND keeping that install profile in synch across later versions without having to have to subscribe all the sites using the install profile to the main synch. (many of those sites might not even be yours)
Basically, the dump to module thing takes the long view. Once you have done your development, and you actually want to deploy it in one or more places, you have a single atomic package of everything you need to get the site up and running, to which people can add their own content.
Gotcha. Ooooh. That is a nifty direction, and definitely one I hadn't thought of. In our 'holy grail' server workflow, I did have the idea that different parts of the data would flow differently to different servers - but indeed, a natural next step would be extending it to generating install profiles. Deploy wouldn't know the difference (nor would it care). s
I've written about my personal approach in the past: http://www.dave-cohen.com/node/1779. I use hook_form_alter so the update.php form always executes on of my update_x hooks. And that hook imports a file with the latest database changes. I wrote about it here at the time, and received a lot of "you should never use hook_form_alter in your .install file" comments. But I do it anyway. So I agree that a solution should be able to apply changes during update.php. -Dave On Saturday 09 August 2008, Adrian Rossouw wrote:
One of the things I would like to see in deploy , or any system we end up using, is the ability to generate a module with a .install file for the configuration info in install and update_x hooks.
I was about to suggest this, but Adrian got there first: On Aug 9, 2008, at 8:09 PM, Adrian Rossouw wrote:
One of the things I would like to see in deploy , or any system we end up using, is the ability to generate a module with a .install file for the configuration info in install and update_x hooks.
This allows developers to version their code and configuration with their version tracking system, and roll out new releases the same way they would normally.
So my contribution to this thread will be a most-helpful +1. This is how we do pretty much everything around here, and I'd be disappointed in any solution that wasn't equally as useful. Thanks! Allie
On Tue, Aug 12, 2008 at 2:41 PM, Allie Micka <allie@pajunas.com> wrote:
On Aug 9, 2008, at 8:09 PM, Adrian Rossouw wrote:
One of the things I would like to see in deploy , or any system we end up using, is the ability to generate a module with a .install file for the configuration info in install and update_x hooks.
This allows developers to version their code and configuration with their version tracking system, and roll out new releases the same way they would normally.
So my contribution to this thread will be a most-helpful +1. This is how we do pretty much everything around here, and I'd be disappointed in any solution that wasn't equally as useful.
Another +1. This is along the lines of what I meant when I said I am dreaming of having automated database migrations for Drupal. Something quick and easy where someone could say, "Ok, everything I just did, export it". Then the next developer can add importing those changes as part of their update workflow. Something where, a month along in development, someone could say, "You know, that commit was bad" and remove that commit from the system, including any database changes that were made. --- Kathleen Murtagh
On Aug 12, 2008, at 4:52 PM, Kathleen Murtagh wrote:
On Tue, Aug 12, 2008 at 2:41 PM, Allie Micka <allie@pajunas.com> wrote:
On Aug 9, 2008, at 8:09 PM, Adrian Rossouw wrote:
One of the things I would like to see in deploy , or any system we end up using, is the ability to generate a module with a .install file for the configuration info in install and update_x hooks.
This allows developers to version their code and configuration with their version tracking system, and roll out new releases the same way they would normally.
So my contribution to this thread will be a most-helpful +1. This is how we do pretty much everything around here, and I'd be disappointed in any solution that wasn't equally as useful.
Another +1. This is along the lines of what I meant when I said I am dreaming of having automated database migrations for Drupal.
I really urge you to take a look at http://cvs.drupal.org/viewvc.py/drupal/contributions/sandbox/alex_b/port/ and let me know what you're thinking. I built port module with exactly that in mind: code in version control, generate install profiles, export/import functions for deployment modules. I got in touch with Boris on how to join forces over port and install profile api - because that's the module I see most overlap with.
Something quick and easy where someone could say, "Ok, everything I just did, export it". Then the next developer can add importing those changes as part of their update workflow.
Something where, a month along in development, someone could say, "You know, that commit was bad" and remove that commit from the system, including any database changes that were made.
--- Kathleen Murtagh
Alex Barth http://www.developmentseed.org/blog tel (202) 250-3633
Quoting Dave Cohen <drupal@dave-cohen.com>:
Personally, I don't see how two DBs improves things. In my experience, nodes are often "configuration" as well as "content". Trying to draw that line somewhere is a mistake, IMHO. You might draw the line where it makes sense for your sites, but not someone elses.
I couldn't agree more. I think of static page content as "configuration" while the dynamic story content as "content".
As you point out, the list is highly debatable. I think it's undecidable.
Well, at least by what we might decide; it will never fit all scenarios. The best we could do is strive to give what we think is best and worst case scenarios. Earnie -- http://for-my-kids.com/ -- http://give-me-an-offer.com/
Earnie Boyd wrote:
Quoting Dave Cohen <drupal@dave-cohen.com>:
Personally, I don't see how two DBs improves things. In my experience, nodes are often "configuration" as well as "content". Trying to draw that line somewhere is a mistake, IMHO. You might draw the line where it makes sense for your sites, but not someone elses.
I couldn't agree more. I think of static page content as "configuration" while the dynamic story content as "content". If we take /any/ node as "configuration", we really are in troubles...
Let's think of a classic scenario (before we dive into esoteric ones) where the only changes to the DB, on the production site are: - adding / modifying nodes, comments, files and users - Any watchdog entries, timestamps or counters, as a result of the above While on the dev/staging servers, we change the rest: - views config - CCKs structure - block position / config - probably most of the things under admin/* Menus are a special case, since adding a menu item on production can be considered as adding content, as oppose to changing the menu's block position, or changing fields in views.
As you point out, the list is highly debatable. I think it's undecidable.
I assume the reason it seems undecidable is because each one has his/her own policies and tricks. If we come out with a /fairly good/ solution, one that would fit 80% of all cases, it will help us adapt ourselves to it.
participants (20)
-
Adrian Rossouw -
Alex Barth -
Allie Micka -
arthur -
Dave Cohen -
Earnie Boyd -
Ernst Plüss -
Gina Beisel -
Greg Dunlap -
Katherine Senzee -
Kathleen Murtagh -
Larry Garfield -
Marjorie Roswell -
Michael Haggerty -
Sam Boyer -
Senpai -
Victor Kane -
victorkane@gmail.com -
Yuval Hager -
Zohar Stolar