[development] Git best practices for client codebases

Wed Mar 2 06:10:54 UTC 2011

On 3/1/11 12:17 PM, larry at garfieldtech.com wrote:
> Unless there's something new in the packager I've not seen yet, using
> d.o pulls in production bypasses the packager.  That is, you're then
> missing:
> 
> - The full version information in the info file, which is used by update
> manager.
> - The License.TXT file that every module is supposed to have.
> 
> Is that no longer the case?  I'm pretty sure both of those still only
> happen with a tarball, so if you want those (and I do) then you need to
> use a tarball.

Ah yes, I definitely was glossing over these - sorry. Mike addressed the
version information necessary for deployment; git_deploy takes care of
that, and that really is an absolutely CRUCIAL piece. In fact, if some
of the additions we'd been talking about went in...well, let's just say
it's quite clever :)

The license file, though...maybe there's something I don't know, but is
there some compelling legal reason that that file actually needs to be
present with files deployed on a client site?

> 
> Also, if you want to manage both core and contrib modules that way it
> means you are now using git submodules, which it's generally agreed suck
> AFAIK, or complex sub-tree merging that is out of reach of 99% of
> developers.  Hell, I've done it and I don't want to do it. :-)

Git submodules don't suck across the board. They only suck if you try to
use them for the wrong purpose - namely, general dev. Which is what most
people try to use them for, and then get frustrated. However, they can
be *very* effective as part of a strategy for developing a real client
site, where you actually do want to record submodule updates in the
parent repo (e.g., when you update a set of modules from upstream, you
reflect that in the parent repo with a commit of all the updated
submodules).

Yes, subtrees are a doozy that don't ever enter most folks' repertoire.
But plain, nested git repos (that just ignore each other) works
delightfully well. They can't rebuild themselves on new servers, but if
your deployment strategy is rsync, that's moot.

> 
> Shallow clones are fine for removing disk size, certainly.  But there's
> workflow considerations there that I don't believe Git solves (at least
> not yet).

OK, there's potential for discussion there. What sort of workflow
considerations?

  If I'm not doing site-specific or company-specific branches
> of core or modules (that is, hacking core or hacking modules, which is a
> no-no in 95% of cases), then the extra patch-level control that the more
> complex all-Git approach would allow is useless because I'm not even
> using it.

A no-no in general, sure, but if one diligently works in Git across the
board, it becomes considerably less dangerous.

Also - not yet widely understood != more complex. I think the all-Git
approach is more the former than the latter.

> 
> I'm not saying there are no use cases for an all-git-all-the-time site
> building process, just that it has implications that you're glossing
> over in return for a benefit that the majority of use cases don't even
> need.

The majority of use cases don't need...until suddenly, they do. Because
maybe a year later you, or some other shop who inherits the work, needs
to verify the state all the modules (or worse, core) is in, and do some
emergency updates. Being able to understand the state of your tree in
the context of the full upstream history gives you a stable baseline for
verification.

Drupal systems (packaging, drush make, etc.) have always been pretty
good at getting the right code in place to start with. It's ensuring the
right code is still in place later on that we suck at. So yeah -
all-git-all-the-time might be a wash when a site build is doing well.
But when you're up shit creek, there's no better paddle.

> 
> --Larry Garfield
> 
> On 3/1/11 12:38 PM, Sam Boyer wrote:
>>
>>
>> On 3/1/11 8:13 AM, larry at garfieldtech.com wrote:
>>> I think the question is more about non-custom dev history; there's
>>> little need for a client site to have the complete development history
>>> of Drupal 4.3 in its repo, for instance.
>>
>> So you do a shallow clone that skips irrelevant branches and only grabs
>> recent history on the ones you want, that's fine.
>>
>>>
>>> Lately, what I've been doing/advocating is using Drush and real releases
>>> to download stuff from Drupal.org (core, contrib modules, etc.) and then
>>> checking the whole site into Git.  If I update a module, I use Drush for
>>> that and then update the code in my Git repo. Then deploy to production
>>> using *my* git repo (which has my full dev history but not every commit
>>> in every one of my projects ever) and tags.
>>
>> ...which is *exactly* what I'm saying is pointless. Why stick a stupider
>> intermediary - tarballs - into a system that's already highly capable of
>> doing patch&  vendor management? The only thing you've accomplished is
>> diluting the capabilities of your version control system to manage
>> upstream changes.
>>
>>>
>>> That keeps me on real releases, avoids unnecessary repository bloat, but
>>> still gives me a full history of all work on that project specifically.
>>
>> "Unnecessary repository bloat?" Two great words there, let's address
>> each one:
>>
>> "Unnecessary": well, the full branch history is a requirement if you
>> want to use git's smart merging algorithms. So the only way it's
>> "unnecessary" is if you prefer manually hauling chunks out of
>> patch-generated .rej and .orig files.
>>
>> "Bloat": Really, step back and think about this. Are you solving a real,
>> compelling problem faced by most modern servers? How much does it matter
>> that your Drupal tree is, say, 70MB instead of 700MB? It really doesn't.
>> Not even on shared hosting. And, let's not forget - judicious use of
>> shallow clones&  compression whittles that number way, WAY down. IMO,
>> ripping out the vendor history is something a lot of us got in the habit
>> of doing because we were used to having CVS vendor data that earned us
>> nothing but headaches, and it was an easy "optimization" that made our
>> Drupal trees feel more svelte.
>>
>> Well, now it does get you something. It gets you a _ton_. Now, all you
>> need for company-specific or site-specific customizations that can
>> easily coexist with rich vendor data is some branch naming conventions
>> and practice with reading git logs. Yeah, that takes some learning too,
>> but it's worth it.
>>
>> cheers
>> s
>>
>>>
>>> --Larry Garfield
>>>
>>> On 3/1/11 1:56 AM, Sam Boyer wrote:
>>>> I tend to advocate full clone. You're talking about a task that version
>>>> control is designed for. Now that we've made the switch, IMO native
>>>> code:Git::bytecode:another VCS, or worse, patch stacks, etc. I don't
>>>> know what drush did before to "make this easy" - maybe pop off patch
>>>> stacks, update the module, then re-apply the patches? Fact is, though,
>>>> nothing Drush could have done under CVS can compare to patching with
>>>> native Git commits: your patches can speak the same language as
>>>> upstream
>>>> changes, and you have all of Git's merge&   rebase behavior at your
>>>> fingertips to reconcile them.
>>>>
>>>> There are some occasional exceptions to this, but I really do think
>>>> it's
>>>> a bit daft not to keep the full history. Keeping that history means
>>>> peace of mind that your patches (now commits) can be intelligently
>>>> merged with all changes ever made to that module for all time, across
>>>> new versions, across Drupal major versions...blah blah blah. Trading a
>>>> few hundred MB of disk space for that is MORE than worth it.
>>>>
>>>> cheers
>>>> s
>>>>
>>>> On 2/28/11 10:56 AM, Marco Carbone wrote:
>>>>> Since a Git clone downloads the entire Drupal repository, the Drupal
>>>>> codebase is no longer so lightweight (~50MB) if you are using Git,
>>>>> especially as if you clone contrib module repositories as well.
>>>>>
>>>>> With CVS, our usual practice with clients was to checkout core and
>>>>> contrib using CVS, so that we can easily monitor any patches that have
>>>>> been applied, so that they wouldn't be lost when updating to newer
>>>>> releases.  (Drush makes this particularly easy.) This is doable
>>>>> with Git
>>>>> as well, but now there seems to be the added cost of having to
>>>>> download
>>>>> the full repository. This is great when doing core/contrib
>>>>> development,
>>>>> but not really necessary for client work. This is unavoidable as
>>>>> far as
>>>>> I can tell, but I don't think I'm satisfied with the "just use a
>>>>> tarball
>>>>> and don't hack core/contrib" solution, especially when patches come
>>>>> into
>>>>> play.
>>>>>
>>>>> Is there something I'm missing/not understanding here, or does one
>>>>> just
>>>>> have to accept the price of a bigger codebase when using Git to manage
>>>>> core/contrib code? Or is managing core/contrib code this way passe now
>>>>> that updates can be done through the UI?
>>>>>
>>>>> -marco////
>>>>
>>>>
>>
>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
Url : http://lists.drupal.org/pipermail/development/attachments/20110301/61a8b74e/attachment-0001.bin