Git best practices for client codebases
Since a Git clone downloads the entire Drupal repository, the Drupal codebase is no longer so lightweight (~50MB) if you are using Git, especially as if you clone contrib module repositories as well. With CVS, our usual practice with clients was to checkout core and contrib using CVS, so that we can easily monitor any patches that have been applied, so that they wouldn't be lost when updating to newer releases. (Drush makes this particularly easy.) This is doable with Git as well, but now there seems to be the added cost of having to download the full repository. This is great when doing core/contrib development, but not really necessary for client work. This is unavoidable as far as I can tell, but I don't think I'm satisfied with the "just use a tarball and don't hack core/contrib" solution, especially when patches come into play. Is there something I'm missing/not understanding here, or does one just have to accept the price of a bigger codebase when using Git to manage core/contrib code? Or is managing core/contrib code this way passe now that updates can be done through the UI? -marco****
The cost in time of doing the checkouts is quite a bit less using the cache-repository techniques in http://randyfay.com/node/93. That might be useful to you, might not. -Randy On Mon, Feb 28, 2011 at 11:56 AM, Marco Carbone <marco.carbone@gmail.com>wrote:
Since a Git clone downloads the entire Drupal repository, the Drupal codebase is no longer so lightweight (~50MB) if you are using Git, especially as if you clone contrib module repositories as well.
With CVS, our usual practice with clients was to checkout core and contrib using CVS, so that we can easily monitor any patches that have been applied, so that they wouldn't be lost when updating to newer releases. (Drush makes this particularly easy.) This is doable with Git as well, but now there seems to be the added cost of having to download the full repository. This is great when doing core/contrib development, but not really necessary for client work. This is unavoidable as far as I can tell, but I don't think I'm satisfied with the "just use a tarball and don't hack core/contrib" solution, especially when patches come into play.
Is there something I'm missing/not understanding here, or does one just have to accept the price of a bigger codebase when using Git to manage core/contrib code? Or is managing core/contrib code this way passe now that updates can be done through the UI?
-marco****
-- Randy Fay Drupal Module and Site Development randy@randyfay.com +1 970.462.7450
Another option to consider is to track patches using Drush Make which considerably lightens what you need to store in Git. -- Kyle Mathews Blog: kyle.mathews2000.com/blog Twitter: http://twitter.com/kylemathews Company: http://eduglu.com On Mon, Feb 28, 2011 at 10:56 AM, Marco Carbone <marco.carbone@gmail.com>wrote:
Since a Git clone downloads the entire Drupal repository, the Drupal codebase is no longer so lightweight (~50MB) if you are using Git, especially as if you clone contrib module repositories as well.
With CVS, our usual practice with clients was to checkout core and contrib using CVS, so that we can easily monitor any patches that have been applied, so that they wouldn't be lost when updating to newer releases. (Drush makes this particularly easy.) This is doable with Git as well, but now there seems to be the added cost of having to download the full repository. This is great when doing core/contrib development, but not really necessary for client work. This is unavoidable as far as I can tell, but I don't think I'm satisfied with the "just use a tarball and don't hack core/contrib" solution, especially when patches come into play.
Is there something I'm missing/not understanding here, or does one just have to accept the price of a bigger codebase when using Git to manage core/contrib code? Or is managing core/contrib code this way passe now that updates can be done through the UI?
-marco****
I tend to advocate full clone. You're talking about a task that version control is designed for. Now that we've made the switch, IMO native code:Git::bytecode:another VCS, or worse, patch stacks, etc. I don't know what drush did before to "make this easy" - maybe pop off patch stacks, update the module, then re-apply the patches? Fact is, though, nothing Drush could have done under CVS can compare to patching with native Git commits: your patches can speak the same language as upstream changes, and you have all of Git's merge & rebase behavior at your fingertips to reconcile them. There are some occasional exceptions to this, but I really do think it's a bit daft not to keep the full history. Keeping that history means peace of mind that your patches (now commits) can be intelligently merged with all changes ever made to that module for all time, across new versions, across Drupal major versions...blah blah blah. Trading a few hundred MB of disk space for that is MORE than worth it. cheers s On 2/28/11 10:56 AM, Marco Carbone wrote:
Since a Git clone downloads the entire Drupal repository, the Drupal codebase is no longer so lightweight (~50MB) if you are using Git, especially as if you clone contrib module repositories as well.
With CVS, our usual practice with clients was to checkout core and contrib using CVS, so that we can easily monitor any patches that have been applied, so that they wouldn't be lost when updating to newer releases. (Drush makes this particularly easy.) This is doable with Git as well, but now there seems to be the added cost of having to download the full repository. This is great when doing core/contrib development, but not really necessary for client work. This is unavoidable as far as I can tell, but I don't think I'm satisfied with the "just use a tarball and don't hack core/contrib" solution, especially when patches come into play.
Is there something I'm missing/not understanding here, or does one just have to accept the price of a bigger codebase when using Git to manage core/contrib code? Or is managing core/contrib code this way passe now that updates can be done through the UI?
-marco////
I think the question is more about non-custom dev history; there's little need for a client site to have the complete development history of Drupal 4.3 in its repo, for instance. Lately, what I've been doing/advocating is using Drush and real releases to download stuff from Drupal.org (core, contrib modules, etc.) and then checking the whole site into Git. If I update a module, I use Drush for that and then update the code in my Git repo. Then deploy to production using *my* git repo (which has my full dev history but not every commit in every one of my projects ever) and tags. That keeps me on real releases, avoids unnecessary repository bloat, but still gives me a full history of all work on that project specifically. --Larry Garfield On 3/1/11 1:56 AM, Sam Boyer wrote:
I tend to advocate full clone. You're talking about a task that version control is designed for. Now that we've made the switch, IMO native code:Git::bytecode:another VCS, or worse, patch stacks, etc. I don't know what drush did before to "make this easy" - maybe pop off patch stacks, update the module, then re-apply the patches? Fact is, though, nothing Drush could have done under CVS can compare to patching with native Git commits: your patches can speak the same language as upstream changes, and you have all of Git's merge& rebase behavior at your fingertips to reconcile them.
There are some occasional exceptions to this, but I really do think it's a bit daft not to keep the full history. Keeping that history means peace of mind that your patches (now commits) can be intelligently merged with all changes ever made to that module for all time, across new versions, across Drupal major versions...blah blah blah. Trading a few hundred MB of disk space for that is MORE than worth it.
cheers s
On 2/28/11 10:56 AM, Marco Carbone wrote:
Since a Git clone downloads the entire Drupal repository, the Drupal codebase is no longer so lightweight (~50MB) if you are using Git, especially as if you clone contrib module repositories as well.
With CVS, our usual practice with clients was to checkout core and contrib using CVS, so that we can easily monitor any patches that have been applied, so that they wouldn't be lost when updating to newer releases. (Drush makes this particularly easy.) This is doable with Git as well, but now there seems to be the added cost of having to download the full repository. This is great when doing core/contrib development, but not really necessary for client work. This is unavoidable as far as I can tell, but I don't think I'm satisfied with the "just use a tarball and don't hack core/contrib" solution, especially when patches come into play.
Is there something I'm missing/not understanding here, or does one just have to accept the price of a bigger codebase when using Git to manage core/contrib code? Or is managing core/contrib code this way passe now that updates can be done through the UI?
-marco////
"That keeps me on real releases, avoids unnecessary repository bloat, but still gives me a full history of all work on that project specifically." Well, svn or whatever VCS one is already using could be used this way as well. And it doesn't really address the issue about managing patches, which probably means that you either don't apply them (I doubt that), or you avoid overwriting them by careful management (a patches directory or careful monitoring of commit logs). But it's true that we aren't in the Wild West days of Drupal 4.7/5 anymore where core patches were more common than not, and so perhaps manual management is perfectly reasonable, and worth avoiding the ball and chain of storing every Drupal commit ever. -marco On Tue, Mar 1, 2011 at 11:13 AM, larry@garfieldtech.com < larry@garfieldtech.com> wrote:
I think the question is more about non-custom dev history; there's little need for a client site to have the complete development history of Drupal 4.3 in its repo, for instance.
Lately, what I've been doing/advocating is using Drush and real releases to download stuff from Drupal.org (core, contrib modules, etc.) and then checking the whole site into Git. If I update a module, I use Drush for that and then update the code in my Git repo. Then deploy to production using *my* git repo (which has my full dev history but not every commit in every one of my projects ever) and tags.
That keeps me on real releases, avoids unnecessary repository bloat, but still gives me a full history of all work on that project specifically.
--Larry Garfield
On 3/1/11 1:56 AM, Sam Boyer wrote:
I tend to advocate full clone. You're talking about a task that version control is designed for. Now that we've made the switch, IMO native code:Git::bytecode:another VCS, or worse, patch stacks, etc. I don't know what drush did before to "make this easy" - maybe pop off patch stacks, update the module, then re-apply the patches? Fact is, though, nothing Drush could have done under CVS can compare to patching with native Git commits: your patches can speak the same language as upstream changes, and you have all of Git's merge& rebase behavior at your fingertips to reconcile them.
There are some occasional exceptions to this, but I really do think it's a bit daft not to keep the full history. Keeping that history means peace of mind that your patches (now commits) can be intelligently merged with all changes ever made to that module for all time, across new versions, across Drupal major versions...blah blah blah. Trading a few hundred MB of disk space for that is MORE than worth it.
cheers s
On 2/28/11 10:56 AM, Marco Carbone wrote:
Since a Git clone downloads the entire Drupal repository, the Drupal codebase is no longer so lightweight (~50MB) if you are using Git, especially as if you clone contrib module repositories as well.
With CVS, our usual practice with clients was to checkout core and contrib using CVS, so that we can easily monitor any patches that have been applied, so that they wouldn't be lost when updating to newer releases. (Drush makes this particularly easy.) This is doable with Git as well, but now there seems to be the added cost of having to download the full repository. This is great when doing core/contrib development, but not really necessary for client work. This is unavoidable as far as I can tell, but I don't think I'm satisfied with the "just use a tarball and don't hack core/contrib" solution, especially when patches come into play.
Is there something I'm missing/not understanding here, or does one just have to accept the price of a bigger codebase when using Git to manage core/contrib code? Or is managing core/contrib code this way passe now that updates can be done through the UI?
-marco////
Yeah, I don't patch core often enough to need an elaborate patch management system. :-) Just checking patch files that are clearly named into the repo is usually fine. --Larry Garfield On 3/1/11 10:30 AM, Marco Carbone wrote:
"That keeps me on real releases, avoids unnecessary repository bloat, but still gives me a full history of all work on that project specifically."
Well, svn or whatever VCS one is already using could be used this way as well. And it doesn't really address the issue about managing patches, which probably means that you either don't apply them (I doubt that), or you avoid overwriting them by careful management (a patches directory or careful monitoring of commit logs). But it's true that we aren't in the Wild West days of Drupal 4.7/5 anymore where core patches were more common than not, and so perhaps manual management is perfectly reasonable, and worth avoiding the ball and chain of storing every Drupal commit ever.
-marco
On Tue, Mar 1, 2011 at 11:13 AM, larry@garfieldtech.com <mailto:larry@garfieldtech.com> <larry@garfieldtech.com <mailto:larry@garfieldtech.com>> wrote:
I think the question is more about non-custom dev history; there's little need for a client site to have the complete development history of Drupal 4.3 in its repo, for instance.
Lately, what I've been doing/advocating is using Drush and real releases to download stuff from Drupal.org (core, contrib modules, etc.) and then checking the whole site into Git. If I update a module, I use Drush for that and then update the code in my Git repo. Then deploy to production using *my* git repo (which has my full dev history but not every commit in every one of my projects ever) and tags.
That keeps me on real releases, avoids unnecessary repository bloat, but still gives me a full history of all work on that project specifically.
--Larry Garfield
On 3/1/11 1:56 AM, Sam Boyer wrote:
I tend to advocate full clone. You're talking about a task that version control is designed for. Now that we've made the switch, IMO native code:Git::bytecode:another VCS, or worse, patch stacks, etc. I don't know what drush did before to "make this easy" - maybe pop off patch stacks, update the module, then re-apply the patches? Fact is, though, nothing Drush could have done under CVS can compare to patching with native Git commits: your patches can speak the same language as upstream changes, and you have all of Git's merge& rebase behavior at your fingertips to reconcile them.
There are some occasional exceptions to this, but I really do think it's a bit daft not to keep the full history. Keeping that history means peace of mind that your patches (now commits) can be intelligently merged with all changes ever made to that module for all time, across new versions, across Drupal major versions...blah blah blah. Trading a few hundred MB of disk space for that is MORE than worth it.
cheers s
On 2/28/11 10:56 AM, Marco Carbone wrote:
Since a Git clone downloads the entire Drupal repository, the Drupal codebase is no longer so lightweight (~50MB) if you are using Git, especially as if you clone contrib module repositories as well.
With CVS, our usual practice with clients was to checkout core and contrib using CVS, so that we can easily monitor any patches that have been applied, so that they wouldn't be lost when updating to newer releases. (Drush makes this particularly easy.) This is doable with Git as well, but now there seems to be the added cost of having to download the full repository. This is great when doing core/contrib development, but not really necessary for client work. This is unavoidable as far as I can tell, but I don't think I'm satisfied with the "just use a tarball and don't hack core/contrib" solution, especially when patches come into play.
Is there something I'm missing/not understanding here, or does one just have to accept the price of a bigger codebase when using Git to manage core/contrib code? Or is managing core/contrib code this way passe now that updates can be done through the UI?
-marco////
On 3/1/11 8:13 AM, larry@garfieldtech.com wrote:
I think the question is more about non-custom dev history; there's little need for a client site to have the complete development history of Drupal 4.3 in its repo, for instance.
So you do a shallow clone that skips irrelevant branches and only grabs recent history on the ones you want, that's fine.
Lately, what I've been doing/advocating is using Drush and real releases to download stuff from Drupal.org (core, contrib modules, etc.) and then checking the whole site into Git. If I update a module, I use Drush for that and then update the code in my Git repo. Then deploy to production using *my* git repo (which has my full dev history but not every commit in every one of my projects ever) and tags.
...which is *exactly* what I'm saying is pointless. Why stick a stupider intermediary - tarballs - into a system that's already highly capable of doing patch & vendor management? The only thing you've accomplished is diluting the capabilities of your version control system to manage upstream changes.
That keeps me on real releases, avoids unnecessary repository bloat, but still gives me a full history of all work on that project specifically.
"Unnecessary repository bloat?" Two great words there, let's address each one: "Unnecessary": well, the full branch history is a requirement if you want to use git's smart merging algorithms. So the only way it's "unnecessary" is if you prefer manually hauling chunks out of patch-generated .rej and .orig files. "Bloat": Really, step back and think about this. Are you solving a real, compelling problem faced by most modern servers? How much does it matter that your Drupal tree is, say, 70MB instead of 700MB? It really doesn't. Not even on shared hosting. And, let's not forget - judicious use of shallow clones & compression whittles that number way, WAY down. IMO, ripping out the vendor history is something a lot of us got in the habit of doing because we were used to having CVS vendor data that earned us nothing but headaches, and it was an easy "optimization" that made our Drupal trees feel more svelte. Well, now it does get you something. It gets you a _ton_. Now, all you need for company-specific or site-specific customizations that can easily coexist with rich vendor data is some branch naming conventions and practice with reading git logs. Yeah, that takes some learning too, but it's worth it. cheers s
--Larry Garfield
On 3/1/11 1:56 AM, Sam Boyer wrote:
I tend to advocate full clone. You're talking about a task that version control is designed for. Now that we've made the switch, IMO native code:Git::bytecode:another VCS, or worse, patch stacks, etc. I don't know what drush did before to "make this easy" - maybe pop off patch stacks, update the module, then re-apply the patches? Fact is, though, nothing Drush could have done under CVS can compare to patching with native Git commits: your patches can speak the same language as upstream changes, and you have all of Git's merge& rebase behavior at your fingertips to reconcile them.
There are some occasional exceptions to this, but I really do think it's a bit daft not to keep the full history. Keeping that history means peace of mind that your patches (now commits) can be intelligently merged with all changes ever made to that module for all time, across new versions, across Drupal major versions...blah blah blah. Trading a few hundred MB of disk space for that is MORE than worth it.
cheers s
On 2/28/11 10:56 AM, Marco Carbone wrote:
Since a Git clone downloads the entire Drupal repository, the Drupal codebase is no longer so lightweight (~50MB) if you are using Git, especially as if you clone contrib module repositories as well.
With CVS, our usual practice with clients was to checkout core and contrib using CVS, so that we can easily monitor any patches that have been applied, so that they wouldn't be lost when updating to newer releases. (Drush makes this particularly easy.) This is doable with Git as well, but now there seems to be the added cost of having to download the full repository. This is great when doing core/contrib development, but not really necessary for client work. This is unavoidable as far as I can tell, but I don't think I'm satisfied with the "just use a tarball and don't hack core/contrib" solution, especially when patches come into play.
Is there something I'm missing/not understanding here, or does one just have to accept the price of a bigger codebase when using Git to manage core/contrib code? Or is managing core/contrib code this way passe now that updates can be done through the UI?
-marco////
Unless there's something new in the packager I've not seen yet, using d.o pulls in production bypasses the packager. That is, you're then missing: - The full version information in the info file, which is used by update manager. - The License.TXT file that every module is supposed to have. Is that no longer the case? I'm pretty sure both of those still only happen with a tarball, so if you want those (and I do) then you need to use a tarball. Also, if you want to manage both core and contrib modules that way it means you are now using git submodules, which it's generally agreed suck AFAIK, or complex sub-tree merging that is out of reach of 99% of developers. Hell, I've done it and I don't want to do it. :-) Shallow clones are fine for removing disk size, certainly. But there's workflow considerations there that I don't believe Git solves (at least not yet). If I'm not doing site-specific or company-specific branches of core or modules (that is, hacking core or hacking modules, which is a no-no in 95% of cases), then the extra patch-level control that the more complex all-Git approach would allow is useless because I'm not even using it. I'm not saying there are no use cases for an all-git-all-the-time site building process, just that it has implications that you're glossing over in return for a benefit that the majority of use cases don't even need. --Larry Garfield On 3/1/11 12:38 PM, Sam Boyer wrote:
On 3/1/11 8:13 AM, larry@garfieldtech.com wrote:
I think the question is more about non-custom dev history; there's little need for a client site to have the complete development history of Drupal 4.3 in its repo, for instance.
So you do a shallow clone that skips irrelevant branches and only grabs recent history on the ones you want, that's fine.
Lately, what I've been doing/advocating is using Drush and real releases to download stuff from Drupal.org (core, contrib modules, etc.) and then checking the whole site into Git. If I update a module, I use Drush for that and then update the code in my Git repo. Then deploy to production using *my* git repo (which has my full dev history but not every commit in every one of my projects ever) and tags.
...which is *exactly* what I'm saying is pointless. Why stick a stupider intermediary - tarballs - into a system that's already highly capable of doing patch& vendor management? The only thing you've accomplished is diluting the capabilities of your version control system to manage upstream changes.
That keeps me on real releases, avoids unnecessary repository bloat, but still gives me a full history of all work on that project specifically.
"Unnecessary repository bloat?" Two great words there, let's address each one:
"Unnecessary": well, the full branch history is a requirement if you want to use git's smart merging algorithms. So the only way it's "unnecessary" is if you prefer manually hauling chunks out of patch-generated .rej and .orig files.
"Bloat": Really, step back and think about this. Are you solving a real, compelling problem faced by most modern servers? How much does it matter that your Drupal tree is, say, 70MB instead of 700MB? It really doesn't. Not even on shared hosting. And, let's not forget - judicious use of shallow clones& compression whittles that number way, WAY down. IMO, ripping out the vendor history is something a lot of us got in the habit of doing because we were used to having CVS vendor data that earned us nothing but headaches, and it was an easy "optimization" that made our Drupal trees feel more svelte.
Well, now it does get you something. It gets you a _ton_. Now, all you need for company-specific or site-specific customizations that can easily coexist with rich vendor data is some branch naming conventions and practice with reading git logs. Yeah, that takes some learning too, but it's worth it.
cheers s
--Larry Garfield
On 3/1/11 1:56 AM, Sam Boyer wrote:
I tend to advocate full clone. You're talking about a task that version control is designed for. Now that we've made the switch, IMO native code:Git::bytecode:another VCS, or worse, patch stacks, etc. I don't know what drush did before to "make this easy" - maybe pop off patch stacks, update the module, then re-apply the patches? Fact is, though, nothing Drush could have done under CVS can compare to patching with native Git commits: your patches can speak the same language as upstream changes, and you have all of Git's merge& rebase behavior at your fingertips to reconcile them.
There are some occasional exceptions to this, but I really do think it's a bit daft not to keep the full history. Keeping that history means peace of mind that your patches (now commits) can be intelligently merged with all changes ever made to that module for all time, across new versions, across Drupal major versions...blah blah blah. Trading a few hundred MB of disk space for that is MORE than worth it.
cheers s
On 2/28/11 10:56 AM, Marco Carbone wrote:
Since a Git clone downloads the entire Drupal repository, the Drupal codebase is no longer so lightweight (~50MB) if you are using Git, especially as if you clone contrib module repositories as well.
With CVS, our usual practice with clients was to checkout core and contrib using CVS, so that we can easily monitor any patches that have been applied, so that they wouldn't be lost when updating to newer releases. (Drush makes this particularly easy.) This is doable with Git as well, but now there seems to be the added cost of having to download the full repository. This is great when doing core/contrib development, but not really necessary for client work. This is unavoidable as far as I can tell, but I don't think I'm satisfied with the "just use a tarball and don't hack core/contrib" solution, especially when patches come into play.
Is there something I'm missing/not understanding here, or does one just have to accept the price of a bigger codebase when using Git to manage core/contrib code? Or is managing core/contrib code this way passe now that updates can be done through the UI?
-marco////
Larry, There is the awesome Git Deploy module that will go out and interface with git meta data and update.drupal.org to determine the correct version information for any module checked out directly from Git. I'm 90% sure that it doesn't even require the git binary to be installed, just a PHP library to access/read git data. http://drupal.org/project/git_deploy -Mike larry@garfieldtech.com wrote:
Unless there's something new in the packager I've not seen yet, using d.o pulls in production bypasses the packager. That is, you're then missing:
- The full version information in the info file, which is used by update manager. - The License.TXT file that every module is supposed to have.
Is that no longer the case? I'm pretty sure both of those still only happen with a tarball, so if you want those (and I do) then you need to use a tarball.
-- __________________ Michael Prasuhn 503.512.0822 office mike@mikeyp.net http://mikeyp.net
On 3/1/11 12:17 PM, larry@garfieldtech.com wrote:
Unless there's something new in the packager I've not seen yet, using d.o pulls in production bypasses the packager. That is, you're then missing:
- The full version information in the info file, which is used by update manager. - The License.TXT file that every module is supposed to have.
Is that no longer the case? I'm pretty sure both of those still only happen with a tarball, so if you want those (and I do) then you need to use a tarball.
Ah yes, I definitely was glossing over these - sorry. Mike addressed the version information necessary for deployment; git_deploy takes care of that, and that really is an absolutely CRUCIAL piece. In fact, if some of the additions we'd been talking about went in...well, let's just say it's quite clever :) The license file, though...maybe there's something I don't know, but is there some compelling legal reason that that file actually needs to be present with files deployed on a client site?
Also, if you want to manage both core and contrib modules that way it means you are now using git submodules, which it's generally agreed suck AFAIK, or complex sub-tree merging that is out of reach of 99% of developers. Hell, I've done it and I don't want to do it. :-)
Git submodules don't suck across the board. They only suck if you try to use them for the wrong purpose - namely, general dev. Which is what most people try to use them for, and then get frustrated. However, they can be *very* effective as part of a strategy for developing a real client site, where you actually do want to record submodule updates in the parent repo (e.g., when you update a set of modules from upstream, you reflect that in the parent repo with a commit of all the updated submodules). Yes, subtrees are a doozy that don't ever enter most folks' repertoire. But plain, nested git repos (that just ignore each other) works delightfully well. They can't rebuild themselves on new servers, but if your deployment strategy is rsync, that's moot.
Shallow clones are fine for removing disk size, certainly. But there's workflow considerations there that I don't believe Git solves (at least not yet).
OK, there's potential for discussion there. What sort of workflow considerations? If I'm not doing site-specific or company-specific branches
of core or modules (that is, hacking core or hacking modules, which is a no-no in 95% of cases), then the extra patch-level control that the more complex all-Git approach would allow is useless because I'm not even using it.
A no-no in general, sure, but if one diligently works in Git across the board, it becomes considerably less dangerous. Also - not yet widely understood != more complex. I think the all-Git approach is more the former than the latter.
I'm not saying there are no use cases for an all-git-all-the-time site building process, just that it has implications that you're glossing over in return for a benefit that the majority of use cases don't even need.
The majority of use cases don't need...until suddenly, they do. Because maybe a year later you, or some other shop who inherits the work, needs to verify the state all the modules (or worse, core) is in, and do some emergency updates. Being able to understand the state of your tree in the context of the full upstream history gives you a stable baseline for verification. Drupal systems (packaging, drush make, etc.) have always been pretty good at getting the right code in place to start with. It's ensuring the right code is still in place later on that we suck at. So yeah - all-git-all-the-time might be a wash when a site build is doing well. But when you're up shit creek, there's no better paddle.
--Larry Garfield
On 3/1/11 12:38 PM, Sam Boyer wrote:
On 3/1/11 8:13 AM, larry@garfieldtech.com wrote:
I think the question is more about non-custom dev history; there's little need for a client site to have the complete development history of Drupal 4.3 in its repo, for instance.
So you do a shallow clone that skips irrelevant branches and only grabs recent history on the ones you want, that's fine.
Lately, what I've been doing/advocating is using Drush and real releases to download stuff from Drupal.org (core, contrib modules, etc.) and then checking the whole site into Git. If I update a module, I use Drush for that and then update the code in my Git repo. Then deploy to production using *my* git repo (which has my full dev history but not every commit in every one of my projects ever) and tags.
...which is *exactly* what I'm saying is pointless. Why stick a stupider intermediary - tarballs - into a system that's already highly capable of doing patch& vendor management? The only thing you've accomplished is diluting the capabilities of your version control system to manage upstream changes.
That keeps me on real releases, avoids unnecessary repository bloat, but still gives me a full history of all work on that project specifically.
"Unnecessary repository bloat?" Two great words there, let's address each one:
"Unnecessary": well, the full branch history is a requirement if you want to use git's smart merging algorithms. So the only way it's "unnecessary" is if you prefer manually hauling chunks out of patch-generated .rej and .orig files.
"Bloat": Really, step back and think about this. Are you solving a real, compelling problem faced by most modern servers? How much does it matter that your Drupal tree is, say, 70MB instead of 700MB? It really doesn't. Not even on shared hosting. And, let's not forget - judicious use of shallow clones& compression whittles that number way, WAY down. IMO, ripping out the vendor history is something a lot of us got in the habit of doing because we were used to having CVS vendor data that earned us nothing but headaches, and it was an easy "optimization" that made our Drupal trees feel more svelte.
Well, now it does get you something. It gets you a _ton_. Now, all you need for company-specific or site-specific customizations that can easily coexist with rich vendor data is some branch naming conventions and practice with reading git logs. Yeah, that takes some learning too, but it's worth it.
cheers s
--Larry Garfield
On 3/1/11 1:56 AM, Sam Boyer wrote:
I tend to advocate full clone. You're talking about a task that version control is designed for. Now that we've made the switch, IMO native code:Git::bytecode:another VCS, or worse, patch stacks, etc. I don't know what drush did before to "make this easy" - maybe pop off patch stacks, update the module, then re-apply the patches? Fact is, though, nothing Drush could have done under CVS can compare to patching with native Git commits: your patches can speak the same language as upstream changes, and you have all of Git's merge& rebase behavior at your fingertips to reconcile them.
There are some occasional exceptions to this, but I really do think it's a bit daft not to keep the full history. Keeping that history means peace of mind that your patches (now commits) can be intelligently merged with all changes ever made to that module for all time, across new versions, across Drupal major versions...blah blah blah. Trading a few hundred MB of disk space for that is MORE than worth it.
cheers s
On 2/28/11 10:56 AM, Marco Carbone wrote:
Since a Git clone downloads the entire Drupal repository, the Drupal codebase is no longer so lightweight (~50MB) if you are using Git, especially as if you clone contrib module repositories as well.
With CVS, our usual practice with clients was to checkout core and contrib using CVS, so that we can easily monitor any patches that have been applied, so that they wouldn't be lost when updating to newer releases. (Drush makes this particularly easy.) This is doable with Git as well, but now there seems to be the added cost of having to download the full repository. This is great when doing core/contrib development, but not really necessary for client work. This is unavoidable as far as I can tell, but I don't think I'm satisfied with the "just use a tarball and don't hack core/contrib" solution, especially when patches come into play.
Is there something I'm missing/not understanding here, or does one just have to accept the price of a bigger codebase when using Git to manage core/contrib code? Or is managing core/contrib code this way passe now that updates can be done through the UI?
-marco////
On Wed, Mar 2, 2011 at 7:10 AM, Sam Boyer <drupal@samboyer.org> wrote:
Also, if you want to manage both core and contrib modules that way it means you are now using git submodules, which it's generally agreed suck AFAIK, or complex sub-tree merging that is out of reach of 99% of developers. Hell, I've done it and I don't want to do it. :-)
Git submodules don't suck across the board. They only suck if you try to use them for the wrong purpose - namely, general dev. Which is what most people try to use them for, and then get frustrated. However, they can be *very* effective as part of a strategy for developing a real client site, where you actually do want to record submodule updates in the parent repo (e.g., when you update a set of modules from upstream, you reflect that in the parent repo with a commit of all the updated submodules).
Yes, subtrees are a doozy that don't ever enter most folks' repertoire. But plain, nested git repos (that just ignore each other) works delightfully well. They can't rebuild themselves on new servers, but if your deployment strategy is rsync, that's moot.
Well, it would be good to document the suggestion for "nested git trees with gitignores" as a best practice if you think it is, because last I've asked on #gitsupport, subrepos were suggested as per http://freso.dk/en/2011/02/26/managing_fresodk_from_cvs_in_svn_to_git (no they did not add that they suck :). The guide currently linked from the main Git docs page for site builders (http://drupal.org/node/803746) is very outdated (assumes contrib is in CVS, and generally that you use a git repo mirrored from CVS), and developers flock to blog posts like that which seem to be up to date and supposedly contain best practice suggestions. Gábor
Definitely would be worth some updated docs. I'm trying to get to such docs (or delegate the responsibility for it) as fast as I can :) On 3/1/11 11:36 PM, Gábor Hojtsy wrote:
On Wed, Mar 2, 2011 at 7:10 AM, Sam Boyer <drupal@samboyer.org> wrote:
Also, if you want to manage both core and contrib modules that way it means you are now using git submodules, which it's generally agreed suck AFAIK, or complex sub-tree merging that is out of reach of 99% of developers. Hell, I've done it and I don't want to do it. :-)
Git submodules don't suck across the board. They only suck if you try to use them for the wrong purpose - namely, general dev. Which is what most people try to use them for, and then get frustrated. However, they can be *very* effective as part of a strategy for developing a real client site, where you actually do want to record submodule updates in the parent repo (e.g., when you update a set of modules from upstream, you reflect that in the parent repo with a commit of all the updated submodules).
Yes, subtrees are a doozy that don't ever enter most folks' repertoire. But plain, nested git repos (that just ignore each other) works delightfully well. They can't rebuild themselves on new servers, but if your deployment strategy is rsync, that's moot.
Well, it would be good to document the suggestion for "nested git trees with gitignores" as a best practice if you think it is, because last I've asked on #gitsupport, subrepos were suggested as per http://freso.dk/en/2011/02/26/managing_fresodk_from_cvs_in_svn_to_git (no they did not add that they suck :). The guide currently linked from the main Git docs page for site builders (http://drupal.org/node/803746) is very outdated (assumes contrib is in CVS, and generally that you use a git repo mirrored from CVS), and developers flock to blog posts like that which seem to be up to date and supposedly contain best practice suggestions.
Gábor
I think your last statement feels correct to me. IMHO using a vcs for deployment is great for developers and clients who are contibuting back to the codebase, but that customers would be muched better served by being kept up to date with releases rather than individual patches. Sent from my iPad On Feb 28, 2011, at 10:56 AM, Marco Carbone <marco.carbone@gmail.com> wrote:
Is there something I'm missing/not understanding here, or does one just have to accept the price of a bigger codebase when using Git to manage core/contrib code? Or is managing core/contrib code this way passe now that updates can be done through the UI?
-marco
how about doing shallow clone with $git-clone --depth=1 then run recursive-diff on the sub-directories (n=1 is just an example) can someone gives comment whether this approach is recommended? -- fireh --
Clone with depth parameters will help reduce size, yep. Not sure what you're going for with a diff on subdirs - if you mean diffing modules, then diffing subdirs really depends on how you've chosen to compose your tree - submodules, subtree merges, or plain nested repos. On 3/1/11 7:53 AM, Fahri Reza wrote:
how about doing shallow clone with $git-clone --depth=1 then run recursive-diff on the sub-directories (n=1 is just an example)
can someone gives comment whether this approach is recommended?
-- fireh --
On 3/1/11 7:53 AM, Fahri Reza wrote:
how about doing shallow clone with $git-clone --depth=1 then run recursive-diff on the sub-directories (n=1 is just an example)
can someone gives comment whether this approach is recommended?
Clone with depth parameters will help reduce size, yep. Not sure what
I just thought that if I use this, it's pretty much like fetching drupal without the tar.gz attribute (compression), and the benefits from incremental- diffs from git is not well implemented.
you're going for with a diff on subdirs - if you mean diffing modules, then diffing subdirs really depends on how you've chosen to compose your tree - submodules, subtree merges, or plain nested repos.
he he, my kmail doesn't support quoting for now, it's related to the OP's question: .. our usual practice with clients was to checkout core and contrib using CVS, so that we can easily monitor any patches that have been applied, .. -- fireh --
participants (9)
-
Dave Metzler -
Fahri Reza -
Gábor Hojtsy -
Kyle Mathews -
larry@garfieldtech.com -
Marco Carbone -
Michael Prasuhn -
Randy Fay -
Sam Boyer