Image by Ville Miettinen | Some Rights Reserved
The Subtree Merge Workflow offers a relatively painless mechanism for managing shared library source code as a component of a larger project. If you have ever had a shared library, custom control, or other component under development that you also wanted to use within one or more dependent projects, you no doubt understand what I mean.
This post is a condensed version of a longer post with more examples, and aimed at those less familiar with Git. If, like me, you have a little trouble following what is going on here, check out the other post first. This one is more of a reference.
Sub-tree Merge – The best of Both Worlds, with Only Half the Pain
In searching for an optimal way to handle the “nested libraries” problem, I came across a few articles on the Sub-tree Merge workflow in GIT. In this workflow, we accomplish the following (this represents my understanding of what is going on – if you know differently, please let me know in the comments!):
- Maintain a tracking branch which tracks the remote shared library project, from which we can pull in changes made in mainline development of the shared library. This remote tracking branch tracks and represents the history of the shared library, distinctly form the history of the main project.
- From this remote, create a “subtree” as a subdirectory within the master branch (or whichever branch we designate) of the main project. The new sub-directory contains a copy of the shared library source code. This subtree does not bring with it the history of commits in the actual shared library. Instead, it joins and shares history with the main project.
- Once this is done, we can pull new change sets down from the shared library remote as needed, and merge them into our sub tree (there’s a trick to this, though, so keep reading). Likewise, we can also merge changes we make within our subtree directory out to our remote tracking branch, and push them up to the shared library remote.
My Mental Picture of the Subtree Workflow:
In my head, I picture the subtree relationships something like THIS:
Create a Subtree in your Main Project
In the examples below, I will use
MainProject as the name of the main project, in which I want to utilize a shared library which is also under development. I will use the iTextTools library from a previous post as the example shared library.
Step 1: Point a Remote in the Main Project at the Shared Library Repository:
$ git remote add <library_remote_name> <library_remote_location>
$ git remote add itext_remote https://github.com/xivSolutions/iTextToolsExample.git
Step 2: Fetch Shared Library into the Working Tree:
$ git fetch <library_remote_name>
$ git fetch itext_remote
Step 3: Checkout a New Tracking Branch Based on the Shared Library Remote:
$ git checkout -b <new_branch_name> remote_name/branch_name
git checkout –b itext_branch itext_remote/master
Step 4: Read the Library Project Into Master as a Subtree Directory:
Before performing this step, switch back to branch master (or whatever your chosen target branch is). The
read-tree command is going to read the contents of the remote library tracking branch into the branch you execute the read-tree command from.
--prefixoption establishes the name of the subdirectory into which the contents of the read will be placed.
-uflag tells git to update the contents of the working tree once the read is complete. Files will be staged for commit when the read is complete.
$ git read-tree --prefix=<subdirectory_name>/ -u <shared_library_branch>
$ git read-tree -–prefix=iTextTools/ –u itext_branch
Step 5: Commit the New Files:
Commit as normal. From this point forward, the files added in the new subdirectory will be tracked as part of this branch, and share history with
MainProject. However, Git also knows about the subtree directory. We can leverage this information to merge changes between the local version of our shared library and the remote library repository, without merging the separate project histories (which have nothing in common other than the use of the library project source code).
As we will see, there are a few things to keep in mind when doing so, but overall it is pretty simple.
Update the Library Sub-Project from the Shared Library Remote
If changes are made to the shared library at the source repository, we will likely want to bring them into our sub-project so that the subproject is up-to-date with the latest code. The process is a little different than the standard pull, fetch/merge, or rebase process we are accustomed to, because most of the time, we don’t want to bring the commit history from the shared library remote into our main project (as they are unrelated and share no common commits).
Before performing this update, I recommend committing any other local changes you may have introduced in
MainProject so that, when you commit the imported changes retrieved for the shared library code, it will stand as its own unique commit.
Step 1: Checkout the Shared Library Remote Tracking Branch:
$ git checkout <library_branch_name>
$ git checkout itext_branch
Step 2: Pull from Remote Library Repository:
When you set up the remote tracking branch originally, you also pointed it at the the shared library remote repo, so Git understands that
git pull from this branch means to fetch/merge from the proper location:
$ git pull
Step 3: Checkout Target Branch
Before we perform any merging, we obviously need to make sure we are back on the branch we want to merge the changes into:
$ git checkout <target_branch>
$ git checkout master
Step 4: Merge Changesets without Merging History
This is where things differ a little from a normal merge or rebase. We want to merge all of the changes, but we don’t want to bring any commit history with it. Also, because we are targeting our subtree directory, we need to explicitly specify the
subtree merge strategy. In the below, the
–s flag tells git we will be explicitly prescribing a merge strategy, followed by the strategy to use (obviously,
subtree in this case):
$ git merge --squash -s subtree --no-commit <library_branch>
$ git merge --squash -s subtree --no-commit itext_branch
Step 5: Commit:
Now the sub-project version of the shared library is up-to-date with the remote. Commit as normal. I recommend using a descriptive commit message indicating that this commit is specifically related to updating the subtree project.
Update the Shared Library Remote with Changes from the Library Sub-Project
It may happen that while working in your main project, you introduce modifications or additional features to the shared library code that you decide should be integrated into the Shared Library source repository. Doing so is essentially the reverse of the preceding. Simply merge the changes from your working branch out to the shared library remote tracking branch (again, without bringing the local project history along), then push to the shared library remote (and/or submit a pull request, depending upon who maintains the shared library repository).
Obviously, prior to performing the following steps, commit your changes in the working tree on branch master (or whatever your working branch is) in your main project.
Step 1: Checkout the Shared Library Remote Tracking Branch
$ git checkout <library_branch>
$ git checkout itext_branch
Step 2: Merge Changes from the Library Sub-Project Without Merging History
As with the previous situation, in all but some very odd circumstance we don’t want to merge the two projects histories together, so we use the
--squash option when we merge. Also as before, we explicitly specify the
subtree merge strategy:
$ git merge --squash -s subtree --no-commit <source_branch>
$ git merge –squash –s subtree –no-commit master
Step 3: Commit the Merged Changes:
Now, assuming there were no merge conflicts, commit as normal, again using a descriptive commit message identifying the commit as a merge from a subtree in an outside project.
Step 4: Push Changes to the Shared Library Repo
When we go to push changes from our remote tracking branch to the shared library remote, the command below will actually push our changes on a new branch (named, not surprisingly, “itext_branch”). For me, this is the behavior I want. This way, I (or whoever is the maintainer of the remote library) can integrate the changes into the mainline of development from within the library remote. In other words, we are not merging directly back into master (rarely how we would want to do things under these circumstances). Obviously, if we are NOT the maintainer of the shared library, this is where a pull request would come in instead of a direct merge.
$ git push <library_remote> <library_branch>
$ git push itext_remote itext_branch
I do not claim to have the authoritative post on this topic by any means. I have written this, and the longer-form post, as much as a learning exercise and reference for myself as anything else. I have sought to present the information in a way which would have helped me understand ti when I was first learning some of Git’s more advanced features and techniques.
I am aware that there are more condensed version of some of the commands used, and quite possibly better ways of doing this. If you have read this far, and have noted some incredible stupidity on my part herein, please do point it out in the comments.
As I learned this, I referred frequently to the following, among other sources:
- The Longer Version of this Post with Command Line Examples
- Pro Git (Online Version) by Scott Chacon
- Git Subtree Merge Workflow Proposal (On Github)
- Git: Subtree vs Submodule