NOTE: This post is mainly targeted at newer git users who may want a more detailed look at what is happening. If you consider yourself comfortable with git, I have also created a “reference” version that skips most of the narrative:
The Quick Reference
- Link to condensed version HERE
Also, some navigation aids, because this is a long-ish post:
- A Handy Diagram
- Walk-Through – Creating a Nested Sub Project
- Update Sub-Project with Changes from the Shared Library Remote
- Update the Shared Library Remote with Changes from your Sub-Project
Problems introduced by “Nested” library projects in a Git Repository
Often it is the case I find myself building out a project and incorporating another library I have been working on (such as a custom control) as a dependency. The library in question may, in fact, be consumed by other projects as well (that IS the purpose of creating such libraries, right?). Equally as often, the project I am referencing as a dependency in my main project will itself be in a state of evolution. A problem arises related to either or both of the following:
- The newly introduced dependency (the “sub-project” e.g. custom control or such) is also being actively developed, and I will want the option to pull in any changes as that library evolves.
- I may, in the course of working on my “main” project, make some modifications to the sub-project, some or all of which I decide should be incorporated into the main line of development of the shared library.
- Set a reference to the binaries of the shared library from within my main project, perform all modifications to the shared library from within that project, and re-build so that changes are reflected in the binary outputs.
- Use Git “Sub modules” to refer to a specific commit within the library project, and manage the updating of these commit pointers as the library project evolves.
- Copy the dependency project source into the main project. Yuck. Isn’t the whole idea of libraries to avoid this type of thing?
I have not used Git sub modules much, but my limited experience with them indicates to me that outside of certain circumstances, they are a pain. Equally unsatisfactory under the conditions I described in the first paragraph is the managing of a standard binary reference (especially in cases where I may need to customize the shared library within my main project, while keeping it up to date with the development repo. The third option is not even worth considering.
In searching for an optimal way to handle the “nested libraries” problem, I came across a few articles on the Sub-tree Merge workflow in GIT. In this workflow, we accomplish the following (this represents my understanding of what is going on – if you know differently, please let me know in the comments!):
- Maintain a tracking branch which tracks the remote shared library project, from which we can pull in changes made in mainline development of the shared library. This remote tracking branch tracks and represents the history of the shared library, distinctly form the history of the main project.
- From this remote, create a “subtree” as a subdirectory within the master branch (or whichever branch we designate) of the main project. The new sub-directory contains a copy of the shared library source code. This subtree does not bring with it the history of commits in the actual shared library. Instead, it joins and shares history with the main project.
- Once this is done, we can pull new change sets down from the shared library remote as needed, and merge them into our sub tree (there’s a trick to this, though, so keep reading). Likewise, we can also merge changes we make within our subtree directory out to our remote tracking branch, and push them up to the shared library remote.
Essentially, we will have told Git to read the contents of the remote tracking branch into a sub-directory on branch master, and remember that this sub-directory represents a subtree based on the remote tracking branch. Graphically, I think of it a little like this:
This is How I Picture the Git Subtree Merge Workflow:
Let’s walk through the process of creating a subtree within an example project, and then we’ll discuss some things to pay attention to when it comes to merging changes to and from the subtree remote.
I have created an example project as the “main” project for use in the following examples. I will be using a Pdf Merge utility project created for another post on this blog as the “shared library.” The main project is called, unsurprisingly, “MainProject.” The shared library is my oh-so-useful iTextToolsExample. We will assume I have already created MainProject, created a local repository on my development machine, and a remote repo on Github.
From the Git command line, we set up a remote that points at the repository for the shared library project. I will be using an example project as the main project, and adding a shared library I created for another post as a subtree.
Git Add Remote Syntax:
$ git add <local_remote_name> <remote_location>
$ git remote add itext_remote https://github.com/xivSolutions/iTextToolsExample.git
Next we need to fetch down the shared library into our working directory using the remote we just created. Note, we want to use “fetch” here, as we don’t want to commit what we add to branch master:
Git Fetch Syntax:
$ git fetch <remote_name>
$ git fetch itext_remote
Note the warning after we fetch about “no common commits.” That’s ok – we wouldn’t expect there to be an common history here, we are importing a library for that very reason!
Now, in keeping with the illustration above, we want to get the contents of the shared library tracked in their own remote tracking branch, so we can keep things in sync separate from master:
Checkout to New Branch Syntax:
$ git checkout -b <new_branch_name> remote_name/branch_name
$ git checkout –b itext_branch itext_remote/master
Now, the new tracking branch points to the root of the shared library project. We can check this by examining the directory contents here in itext_branch:
Directory Listing in iText Branch:
And then checkout master and compare:
Directory Listing in Master:
Now is when we perform the magic. We will use
git read-tree to read the contents of
itext_branch into a subdirectory of
master. This will be essentially the same as copying it all in, with on small exception: Git knows and remembers that this is a subtree related to the remote tracking branch itext_branch.
Git Read-Tree Syntax:
$ git read-tree --prefix=<subdirectory_name>/ –u <shared_library_branch>
$ git read-tree –-prefix=itextTools/ –u itext_branch
In the above,
<subdirectory_name> can be anything you like – it is going to be the name of a new “folder” or subdirectory within your mater branch. The
--prefix flag tells git that this will be the name of a directory. I usually use the same name as the original shared library project. Note it is important to include the forward-slash after this. The –s flag tells git to go ahead and update the working tree with the new changes. Also note, we execute this command from within branch
Running Git Read-Tree From Branch Master:
Now, if we run git status, we can see that a bevy of new files have been added to our master branch, in a directory named (not surprisingly) “iTextTools:”
Status of Branch Master After Read-Tree Operation:
After you have performed the read-tree operation, the files have been read into our master branch as described above. The important thing here is that the history they share with the remote source repository (as referenced by the itext_branch) has NOT. We can commit this as a single commit:
Commit the New Directory and Files:
We have now added the shared iTextTools library as a subtree in our master branch. If we check the contents of our project directory on branch master now, we see the new iTextTools subdirectory:
Now, what if there are changes introduced to the main shared library source project? How can keep our sub-project in sync with mainline development on the shared library?
If we need to pull changes down from the original shared library source into our sub-project, we can simply checkout the remote tracking branch itext_branch and use
git pull, which will pull down and merge any changes made since we originally fetched the project. Say we (or whoever maintains the itextTools project) added a new class named
SomeNewClass and solution file to the main development repo for the
Pulling Changes down from the Shared Library Remote:
While the above pulls the changes down, we need to be careful with this next step – merging the new change set into our subtree in master. The changes we just pulled down bring with them all of the history from the iText remote repository. In almost all cases, we do not want to merge this history with that of our main project, so we can’t simply do a merge or rebase to get the new changes into master. Instead, we want to use the following:
Squash and No-Commit Change Sets into the Subtree (Syntax):
$ git merge --squash -s subtree --no-commit <source_branch>
$ git merge --squash –s subtree –-no-commit itext_branch
In the above, the
--s flag indicates that you are going to specify which merge strategy git should use in merging changes. This is followed by the strategy itself, which in this case is the
subtree strategy. We are also telling git to squash the commits from
<source_branch> into a single change set, and to not automatically commit when the merge is complete.
So, now that we have pulled the changes down into our remote tracking branch, we checkout
master, and execute the above command:
Merging Changes from Remote Shared Library into Subtree (from branch master):
If we check the status of
master again, we see the changes from the remote iText library repo have been merged into the appropriate subdirectory, and staged for commit:
Status of Master After Subtree Merge Strategy with –Squash and –No-Commit:
From here, simply commit (with a descriptive commit message!).
If you made some changes to the shared library code within your subproject and decide you want to push them out to the main development repo, the process is essentially the same, in reverse. You merge from the subtree directory within the main project (again, explicitly specifying the Subtree Merge strategy), out to the tracking branch that points to the shared library remote. Then push the changes up to the remote.
Say we made some changes to
SomeNewClass while working in
MainProject and we decided they should be incorporated into mainline development for the
itextTools project main repo:
- Once you have made changes to the shared library from within branch
masterof your main project, commit the changes as normal.
- Then, checkout the tracking branch that points to the shared library remote.
Now, we use the same basic merge command from before, in the opposite direction:
Syntax to Merge from Subtree to the Remote Tracking Branch:
$ git merge --squash -s subtree --no-commit <source_branch>
$ git merge --squash –s subtree --no-commit master
Merging the itextTools Subtree into the Remote Tracking Branch:
As in the previous example, we can check Git status and see that the changes we made in our sub-tree project have been merged into the remote tracking branch and are now staged for commit:
Now commit the changes. I usually state clearly in the commit message that these are changes from <project>/subtree so that, when added to the shared library, I can clearly identify that this set of changes came from the sub-project. Now it’s time to push the changes back up to the main shared library repo.
Pushing from Tracking Branch to Shared Library Remote Syntax:
$ git push <shared_library_remote> <library_tracking_branch>
$ git push itext_remote itext_branch
Pushing to the Remote Repository:
Note that using the syntax above will push the changes into a new branch in my remote repo, named (not surprisingly)
itext_branch. This will allow me to apply the changes to the master or production branch of the shared library using a merge or pull request (in the case where I am not the maintainer of the shared library).
There is More To It
I have brushed through the basics here. There are numerous ways to condense some of the commands I used here, and like most things, there are probably dozens of different viewpoints on how to best manage project dependencies in source control. The Subtree Strategy has been working best for me, in my working context, but I welcome your feedback, and especially, corrections where I might have it wrong, or have missed seeing a better way.
Thanks for reading.
I am not going to pretend there aren’t any number of good resources from which I learned the above. Most of the information here I assimilated from elsewhere. Here I have striven to simply present my own understanding, in a way which might have helped *ME* when I was trying to figure it out. Here are some of those sources:
- Condensed Version of This Post for Reference
- Pro Git (Online Version) by Scott Chacon
- Git Subtree Workflow Proposal (Github)
- Fluid Motion – Git Subtree vs Submodule