CodeProject

Managing Nested Libraries Using the GIT Subtree Merge Workflow


Image by Ville Miettinen | Some Rights Reserved

NOTE: This post is mainly targeted at newer git users who may want a more detailed look at what is happening. If you consider yourself comfortable with git, I have also created a “reference” version that skips most of the narrative:

The Quick Reference

  • Link to condensed version HERE

Also, some navigation aids, because this is a long-ish post:

Problems introduced by “Nested” library projects in a Git Repository

Often it is the case I find myself building out a project and incorporating another library I have been working on (such as a custom control) as a dependency. The library in question may, in fact, be consumed by other projects as well (that IS the purpose of creating such libraries, right?). Equally as often, the project I am referencing as a dependency in my main project will itself be in a state of evolution. A problem arises related to either or both of the following:

  • The newly introduced dependency (the “sub-project” e.g. custom control or such) is also being actively developed, and I will want the option to pull in any changes as that library evolves.
  • I may, in the course of working on my “main” project, make some modifications to the sub-project, some or all of which I decide should be incorporated into the main line of development of the shared library.

I was aware of only three ways to manage dependencies on shared libraries:

  • Clay Davis Git SubmodulesSet a reference to the binaries of the shared library from within my main project, perform all modifications to the shared library from within that project, and re-build so that changes are reflected in the binary outputs.
  • Use Git “Sub modules” to refer to a specific commit within the library project, and manage the updating of these commit pointers as the library project evolves.
  • Copy the dependency project source into the main project. Yuck. Isn’t the whole idea of libraries to avoid this type of thing?

I have not used Git sub modules much, but my limited experience with them indicates to me that outside of certain circumstances, they are a pain. Equally unsatisfactory under the conditions I described in the first paragraph is the managing of a standard binary reference (especially in cases where I may need to customize the shared library within my main project, while keeping it up to date with the development repo. The third option is not even worth considering.

Sub-tree Merge – The best of Both Worlds, with Only Half the Pain

In searching for an optimal way to handle the “nested libraries” problem, I came across a few articles on the Sub-tree Merge workflow in GIT. In this workflow, we accomplish the following (this represents my understanding of what is going on – if you know differently, please let me know in the comments!):

  • Maintain a tracking branch which tracks the remote shared library project, from which we can pull in changes made in mainline development of the shared library. This remote tracking branch tracks and represents the history of the shared library, distinctly form the history of the main project.
  • From this remote, create a “subtree” as a subdirectory within the master branch (or whichever branch we designate) of the main project. The new sub-directory contains a copy of the shared library source code. This subtree does not bring with it the history of commits in the actual shared library. Instead, it joins and shares history with the main project.
  • Once this is done, we can pull new change sets down from the shared library remote as needed, and merge them into our sub tree (there’s a trick to this, though, so keep reading). Likewise, we can also merge changes we make within our subtree directory out to our remote tracking branch, and push them up to the shared library remote.

Confused yet? It makes sense in a minute.

Essentially, we will have told Git to read the contents of the remote tracking branch into a sub-directory on branch master, and remember that this sub-directory represents a subtree based on the remote tracking branch. Graphically, I think of it a little like this:

This is How I Picture the Git Subtree Merge Workflow:

Subtree Illustration

Walk Through:

Let’s walk through the process of creating a subtree within an example project, and then we’ll discuss some things to pay attention to when it comes to merging changes to and from the subtree remote.

I have created an example project as the “main” project for use in the following examples. I will be using a Pdf Merge utility project created for another post on this blog as the “shared library.” The main project is called, unsurprisingly, “MainProject.” The shared library is my oh-so-useful iTextToolsExample. We will assume I have already created MainProject, created a local repository on my development machine, and a remote repo on Github.

Step 1: Create a Remote that Points to the Shared Library Repo

From the Git command line, we set up a remote that points at the repository for the shared library project. I will be using an example project as the main project, and adding a shared library I created for another post as a subtree.

Git Add Remote Syntax:
$ git add <local_remote_name> <remote_location>

e.g:

$ git remote add itext_remote https://github.com/xivSolutions/iTextToolsExample.git

bash-add-remote

Step 2: Fetch Shared Library into the Working Tree:

Next we need to fetch down the shared library into our working directory using the remote we just created. Note, we want to use “fetch” here, as we don’t want to commit what we add to branch master:

Git Fetch Syntax:
$ git fetch <remote_name>

e.g:

$ git fetch itext_remote

bash-fetch-remote

Note the warning after we fetch about “no common commits.” That’s ok – we wouldn’t expect there to be an common history here, we are importing a library for that very reason!

Step 3: Checkout Out a New Tracking Branch Based on the Shared Library Remote

Now, in keeping with the illustration above, we want to get the contents of the shared library tracked in their own remote tracking branch, so we can keep things in sync separate from master:

Checkout to New Branch Syntax:
$ git checkout -b <new_branch_name> remote_name/branch_name

e.g:

$ git checkout –b itext_branch itext_remote/master

bash-checkout-to-tracking-branch

Now, the new tracking branch points to the root of the shared library project. We can check this by examining the directory contents here in itext_branch:

Directory Listing in iText Branch:

bash-directory-itext-branch

And then checkout master and compare:

Directory Listing in Master:

bash-directory-master-branch

Step 4: Read the Library Project into master as a subdirectory

Now is when we perform the magic. We will use git read-tree to read the contents of itext_branch into a subdirectory of master. This will be essentially the same as copying it all in, with on small exception: Git knows and remembers that this is a subtree related to the remote tracking branch itext_branch.

Git Read-Tree Syntax:
$ git read-tree --prefix=<subdirectory_name>/ –u <shared_library_branch>

e.g:

$ git read-tree –-prefix=itextTools/ –u itext_branch

In the above, <subdirectory_name> can be anything you like – it is going to be the name of a new “folder” or subdirectory within your mater branch. The --prefix flag tells git that this will be the name of a directory. I usually use the same name as the original shared library project. Note it is important to include the forward-slash after this. The –s flag tells git to go ahead and update the working tree with the new changes. Also note, we execute this command from within branch master:

Running Git Read-Tree From Branch Master:

bash-read-tree

Now, if we run git status, we can see that a bevy of new files have been added to our master branch, in a directory named (not surprisingly) “iTextTools:”

Status of Branch Master After Read-Tree Operation:

bash-status-after-read-tree

Step 5: Commit The New Sub-Tree

After you have performed the read-tree operation, the files have been read into our master branch as described above. The important thing here is that the history they share with the remote source repository (as referenced by the itext_branch) has NOT. We can commit this as a single commit:

Commit the New Directory and Files:

bash-commit-sub-tree

We have now added the shared iTextTools library as a subtree in our master branch. If we check the contents of our project directory on branch master now, we see the new iTextTools subdirectory:

bash-directory-master-branch-after-add-subtree

Now, what if there are changes introduced to the main shared library source project? How can keep our sub-project in sync with mainline development on the shared library?

Updating the Nested Subtree Project with Changes from the Shared Library Remote

If we need to pull changes down from the original shared library source into our sub-project, we can simply checkout the remote tracking branch itext_branch and use git pull, which will pull down and merge any changes made since we originally fetched the project. Say we (or whoever maintains the itextTools project) added a new class named SomeNewClass and solution file to the main development repo for the itextTools library:

Pulling Changes down from the Shared Library Remote:

bash-pull-new-changes-from-remote-library

While the above pulls the changes down, we need to be careful with this next step – merging the new change set into our subtree in master. The changes we just pulled down bring with them all of the history from the iText remote repository. In almost all cases, we do not want to merge this history with that of our main project, so we can’t simply do a merge or rebase to get the new changes into master. Instead, we want to use the following:

Squash and No-Commit Change Sets into the Subtree (Syntax):
$ git merge --squash -s subtree --no-commit <source_branch>

e.g:

$ git merge --squash –s subtree –-no-commit itext_branch

In the above, the --s flag indicates that you are going to specify which merge strategy git should use in merging changes. This is followed by the strategy itself, which in this case is the subtree strategy. We are also telling git to squash the commits from <source_branch> into a single change set, and to not automatically commit when the merge is complete.

So, now that we have pulled the changes down into our remote tracking branch, we checkout master, and execute the above command:

Merging Changes from Remote Shared Library into Subtree (from branch master):

bash-merge-new-changes-from-remote

If we check the status of master again, we see the changes from the remote iText library repo have been merged into the appropriate subdirectory, and staged for commit:

Status of Master After Subtree Merge Strategy with –Squash and –No-Commit:

bash-satus-after merge-new-changes-from-remote

From here, simply commit (with a descriptive commit message!).

Updating the Shared Library Remote with Changes made in the Subtree

If you made some changes to the shared library code within your subproject and decide you want to push them out to the main development repo, the process is essentially the same, in reverse. You merge from the subtree directory within the main project (again, explicitly specifying the Subtree Merge strategy), out to the tracking branch that points to the shared library remote. Then push the changes up to the remote.

Say we made some changes to SomeNewClass while working in MainProject and we decided they should be incorporated into mainline development for the itextTools project main repo:

  1. Once you have made changes to the shared library from within branch master of your main project, commit the changes as normal.
  2. Then, checkout the tracking branch that points to the shared library remote.

Now, we use the same basic merge command from before, in the opposite direction:

Syntax to Merge from Subtree to the Remote Tracking Branch:
$ git merge --squash -s subtree --no-commit <source_branch>

e.g:

$ git merge --squash –s subtree --no-commit master
Merging the itextTools Subtree into the Remote Tracking Branch:

bash-merge-new-changes-from-subtree-to-remote

As in the previous example, we can check Git status and see that the changes we made in our sub-tree project have been merged into the remote tracking branch and are now staged for commit:

bash-satus-after merge-new-changes-to-remote

Now commit the changes. I usually state clearly in the commit message that these are changes from <project>/subtree so that, when added to the shared library, I can clearly identify that this set of changes came from the sub-project. Now it’s time to push the changes back up to the main shared library repo.

Pushing from Tracking Branch to Shared Library Remote Syntax:
$ git push <shared_library_remote> <library_tracking_branch>

e.g:

$ git push itext_remote itext_branch
Pushing to the Remote Repository:

bash-push-new-changes-to-shared-remote

Note that using the syntax above will push the changes into a new branch in my remote repo, named (not surprisingly) itext_branch. This will allow me to apply the changes to the master or production branch of the shared library using a merge or pull request (in the case where I am not the maintainer of the shared library).

There is More To It

I have brushed through the basics here. There are numerous ways to condense some of the commands I used here, and like most things, there are probably dozens of different viewpoints on how to best manage project dependencies in source control. The Subtree Strategy has been working best for me, in my working context, but I welcome your feedback, and especially, corrections where I might have it wrong, or have missed seeing a better way.

Thanks for reading.

Additional Resources

I am not going to pretend there aren’t any number of good resources from which I learned the above. Most of the information here I assimilated from elsewhere. Here I have striven to simply present my own understanding, in a way which might have helped *ME* when I was trying to figure it out. Here are some of those sources:

CodeProject
Git Quick Reference: Interactive Patch Staging with git add -p
ASP.Net
ASP.NET Identity 2.0: Setting Up Account Validation and Two-Factor Authorization
C#
.Net DataGridview: Change Formatting of the Active Column Header to Track User Location
  • jatten

    jattenjatten

    Author Reply

    You are so very correct – ooops!

    Thanks for catching that. Will update when I get home from work!

    Also, thanks for reading!


  • jrochkind

    jrochkindjrochkind

    Author Reply

    Thanks, this is helpful.

    Possible typo: around sentence " The –s flag tells git to go ahead and update the working tree with the new changes. " — there is no "-s" flag in the examples. Do you mean the "-u" flag?