[development] Git best practices for client codebases

Tue Mar 1 20:17:42 UTC 2011

Unless there's something new in the packager I've not seen yet, using 
d.o pulls in production bypasses the packager.  That is, you're then 
missing:

- The full version information in the info file, which is used by update 
manager.
- The License.TXT file that every module is supposed to have.

Is that no longer the case?  I'm pretty sure both of those still only 
happen with a tarball, so if you want those (and I do) then you need to 
use a tarball.

Also, if you want to manage both core and contrib modules that way it 
means you are now using git submodules, which it's generally agreed suck 
AFAIK, or complex sub-tree merging that is out of reach of 99% of 
developers.  Hell, I've done it and I don't want to do it. :-)

Shallow clones are fine for removing disk size, certainly.  But there's 
workflow considerations there that I don't believe Git solves (at least 
not yet).  If I'm not doing site-specific or company-specific branches 
of core or modules (that is, hacking core or hacking modules, which is a 
no-no in 95% of cases), then the extra patch-level control that the more 
complex all-Git approach would allow is useless because I'm not even 
using it.

I'm not saying there are no use cases for an all-git-all-the-time site 
building process, just that it has implications that you're glossing 
over in return for a benefit that the majority of use cases don't even need.

--Larry Garfield

On 3/1/11 12:38 PM, Sam Boyer wrote:
>
>
> On 3/1/11 8:13 AM, larry at garfieldtech.com wrote:
>> I think the question is more about non-custom dev history; there's
>> little need for a client site to have the complete development history
>> of Drupal 4.3 in its repo, for instance.
>
> So you do a shallow clone that skips irrelevant branches and only grabs
> recent history on the ones you want, that's fine.
>
>>
>> Lately, what I've been doing/advocating is using Drush and real releases
>> to download stuff from Drupal.org (core, contrib modules, etc.) and then
>> checking the whole site into Git.  If I update a module, I use Drush for
>> that and then update the code in my Git repo. Then deploy to production
>> using *my* git repo (which has my full dev history but not every commit
>> in every one of my projects ever) and tags.
>
> ...which is *exactly* what I'm saying is pointless. Why stick a stupider
> intermediary - tarballs - into a system that's already highly capable of
> doing patch&  vendor management? The only thing you've accomplished is
> diluting the capabilities of your version control system to manage
> upstream changes.
>
>>
>> That keeps me on real releases, avoids unnecessary repository bloat, but
>> still gives me a full history of all work on that project specifically.
>
> "Unnecessary repository bloat?" Two great words there, let's address
> each one:
>
> "Unnecessary": well, the full branch history is a requirement if you
> want to use git's smart merging algorithms. So the only way it's
> "unnecessary" is if you prefer manually hauling chunks out of
> patch-generated .rej and .orig files.
>
> "Bloat": Really, step back and think about this. Are you solving a real,
> compelling problem faced by most modern servers? How much does it matter
> that your Drupal tree is, say, 70MB instead of 700MB? It really doesn't.
> Not even on shared hosting. And, let's not forget - judicious use of
> shallow clones&  compression whittles that number way, WAY down. IMO,
> ripping out the vendor history is something a lot of us got in the habit
> of doing because we were used to having CVS vendor data that earned us
> nothing but headaches, and it was an easy "optimization" that made our
> Drupal trees feel more svelte.
>
> Well, now it does get you something. It gets you a _ton_. Now, all you
> need for company-specific or site-specific customizations that can
> easily coexist with rich vendor data is some branch naming conventions
> and practice with reading git logs. Yeah, that takes some learning too,
> but it's worth it.
>
> cheers
> s
>
>>
>> --Larry Garfield
>>
>> On 3/1/11 1:56 AM, Sam Boyer wrote:
>>> I tend to advocate full clone. You're talking about a task that version
>>> control is designed for. Now that we've made the switch, IMO native
>>> code:Git::bytecode:another VCS, or worse, patch stacks, etc. I don't
>>> know what drush did before to "make this easy" - maybe pop off patch
>>> stacks, update the module, then re-apply the patches? Fact is, though,
>>> nothing Drush could have done under CVS can compare to patching with
>>> native Git commits: your patches can speak the same language as upstream
>>> changes, and you have all of Git's merge&   rebase behavior at your
>>> fingertips to reconcile them.
>>>
>>> There are some occasional exceptions to this, but I really do think it's
>>> a bit daft not to keep the full history. Keeping that history means
>>> peace of mind that your patches (now commits) can be intelligently
>>> merged with all changes ever made to that module for all time, across
>>> new versions, across Drupal major versions...blah blah blah. Trading a
>>> few hundred MB of disk space for that is MORE than worth it.
>>>
>>> cheers
>>> s
>>>
>>> On 2/28/11 10:56 AM, Marco Carbone wrote:
>>>> Since a Git clone downloads the entire Drupal repository, the Drupal
>>>> codebase is no longer so lightweight (~50MB) if you are using Git,
>>>> especially as if you clone contrib module repositories as well.
>>>>
>>>> With CVS, our usual practice with clients was to checkout core and
>>>> contrib using CVS, so that we can easily monitor any patches that have
>>>> been applied, so that they wouldn't be lost when updating to newer
>>>> releases.  (Drush makes this particularly easy.) This is doable with Git
>>>> as well, but now there seems to be the added cost of having to download
>>>> the full repository. This is great when doing core/contrib development,
>>>> but not really necessary for client work. This is unavoidable as far as
>>>> I can tell, but I don't think I'm satisfied with the "just use a tarball
>>>> and don't hack core/contrib" solution, especially when patches come into
>>>> play.
>>>>
>>>> Is there something I'm missing/not understanding here, or does one just
>>>> have to accept the price of a bigger codebase when using Git to manage
>>>> core/contrib code? Or is managing core/contrib code this way passe now
>>>> that updates can be done through the UI?
>>>>
>>>> -marco////
>>>
>>>
>
>