Week 8

Day 5 - "Shhh....we're in a library"

Nuclear fusion

OK, so we are not quite at the stage of nuclear physics, but it would be nice to know how to bring our library back into our repository. Git offers a tool called git submodule. This tool allows you to link a remote repositories branch and store it under a subdirectory of the project. It does have some nuances which must be learnt, but can be very useful. Let us add our testing suite from the subrepo repository into the directory called tester in our main coderepo repository. First we must remove our tester directory.
john@satsuki:~/coderepo$ git checkout master
Already on 'master'
john@satsuki:~/coderepo$ git rm tester/*
rm 'tester/newfile1'
rm 'tester/newfile2'
rm 'tester/newfile3'
rm 'tester/test.sh'
john@satsuki:~/coderepo$ git commit -a -m 'Removed tester - will be replaced by submodule'
[master 5698499] Removed tester - will be replaced by submodule
4 files changed, 0 insertions(+), 20 deletions(-)
delete mode 100644 tester/newfile1
delete mode 100644 tester/newfile2
delete mode 100644 tester/newfile3
delete mode 100755 tester/test.sh
john@satsuki:~/coderepo$

We need to define what a submodule actually is. Submodules are tricky to understand and often people use them once and conclude that they are more trouble than they are worth. However, if you take some time to understand what a submodule really is, then they can be very useful to you. A submodule is the inclusion of a repository branch at a specific commit. It is not intended to track the development of the upstream library or module, (see the callout box for an explanation of upstream).

Terminology - Upstream

Upstream refers to the source of a project which may have one or more derivatives which are also distributed. Take the package that was used to build this book for example, LaTeX. LaTeX is distributed by the people who developed it as open source software, but it is also included with a number of Linux distributions. The location of the software created by the LaTeX developers is referred to as the upstream project. The projects which include it within their own are what is referred to as downstream. Think of it like a river which flows from the source further upstream.

As we will see, though it can be a little longwinded to actually change the version of the code that the submodule refers to, it actually makes a lot of sense to handle them in this way. If the code in the submodule is being included in your repository, you do not want to run the risk of a change upstream resulting in a broken build for your project. This is why submodules always refer to a single commit.

Let us go ahead, create a submodule and then discuss the steps we have taken.
john@satsuki:~/coderepo$ git submodule add /home/john/subrepo tester
Cloning into tester...
done.
john@satsuki:~/coderepo$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# new file: .gitmodules
# new file: tester
#
john@satsuki:~/coderepo$ git commit -a -m 'Added submodule (subrepo)'
[master 2aadc11] Added submodule (subrepo)
2 files changed, 4 insertions(+), 0 deletions(-)
create mode 100644 .gitmodules
create mode 160000 tester
john@satsuki:~/coderepo/tester$

As you can see we had to perform a number of steps before we obtained the source for the subrepo library in our tester directory. We had to begin by using git submodule to add the upstream repository. The upstream repository is really just like any remote repository we have been using, but we will use the terminology upstream to make a distinction. The command git submodule add /home/john/subrepo tester creates a special file in the root of our project called .gitmodules, plus it clones the upstream repository into the folder we specified, in this case tester.

Notice that when we ran git status, we saw two new entries, one for .gitmodules and one for tester. Next we have to commit those entries using the standard git commit command. When we do, we see that there is a code in front of tester which is special and tells Git to treat this directory as a submodule.

Though the submodule has now been added, it has not yet been initialised. To do this, we run our next set of steps.
john@satsuki:~/coderepo$ git submodule init
Submodule 'tester' (/home/john/subrepo) registered for path 'tester'
john@satsuki:~/coderepo$ git submodule update
john@satsuki:~/coderepo$

Now our submodule has been added and initialised. The update command is used to ensure that the directory tester contains the version of the submodule that we committed earlier.
john@satsuki:~/coderepo$ cd tester/
john@satsuki:~/coderepo/tester$ ls
newfile1 newfile2 newfile3 test.sh
john@satsuki:~/coderepo/tester$ git log --format=oneline
590e0eb79bc5ba0bc09f611392e643f676b00a04 Work on tester nf3
785b86d877d2a5c0679d98181a23d06ed2ba7652 Work on tester nf2
1ff89f787438f081a0d74de2d26eb2d831c9c738 Work on tester nf1
a5a0d9762dd4b50d8f3228e37b315f6056d5a034 Moved testing suite
john@satsuki:~/coderepo$

Looking in the directory we can see two things. The first, is that the files present in the subrepo upstream project have now been added. The second, may appear a little suprising to begin with. The git log command actually shows a log for the upstream project, not for the local root project stored in coderepo. In all honesty, the submodule repository is actually just a clone of the upstream project, with a few subtle differences.

The information about which upstream url to use for the project can be found in the .gitmodules which we committed earlier. Below is an example of what the file looks like in our current repository.
john@satsuki:~/coderepo$ cat .gitmodules
[submodule "tester"]
path = tester
url = /home/john/subrepo
john@satsuki:~/coderepo$

Changes down the river

So what happens when we want to pull in changes from the upstream project? Well, you can make your submodule point to whatever commit you like and stay there. As long as you commit your changes in the super project, Git will always allow you to return to that point using the git submodule update command.

Let us take a look at how we could pull in some changes into our tester submodule. First, we are going to make a change to our upstream project.
john@satsuki:~/coderepo$ cd ..
john@satsuki:~$ cd subrepo
john@satsuki:~/subrepo$ ls
newfile1 newfile2 newfile3 test.sh
john@satsuki:~/subrepo$ echo "Added a new function" > newfile4
john@satsuki:~/subrepo$ git add newfile4
john@satsuki:~/subrepo$ git commit -a -m 'Added a new library file'
[master 94ad27e] Added a new library file
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 newfile4
john@satsuki:~/subrepo$ cd ..
john@satsuki:~/subrepo$

Now that we have a new version of the project, let us try to pull those changes into our superproject.
john@satsuki:~$ cd coderepo
john@satsuki:~/coderepo$ cd tester
john@satsuki:~/coderepo/tester$ git status
# On branch master
nothing to commit (working directory clean)
john@satsuki:~/coderepo/tester$ git fetch origin
remote: Counting objects: 4, done.
remote: Compressing objects: 100remote: Total 3 (delta 1), reused 0 (delta 0)
Unpacking objects: 100From /home/john/subrepo
590e0eb..94ad27e master -> origin/master
john@satsuki:~/coderepo/tester$ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
john@satsuki:~/coderepo/tester$

As you can see, we are told that our branch is currently one commit behind that of origin/master. If we want to update our master branch in the submodule, we need to pull our changes in, just like a real Git repository.
john@satsuki:~/coderepo/tester$ git pull
Updating 590e0eb..94ad27e
Fast-forward
newfile4 | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 newfile4
john@satsuki:~/coderepo/tester$ ls
newfile1 newfile2 newfile3 newfile4 test.sh
john@satsuki:~/coderepo/tester$ cd ..

Now let us see what happens if we try to update the module.
john@satsuki:~/coderepo$ git submodule update
Submodule path 'tester': checked out '590e0eb79bc5ba0bc09f611392e643f676b00a04'
john@satsuki:~/coderepo$ cd tester
john@satsuki:~/coderepo/tester$ ls
newfile1 newfile2 newfile3 test.sh
john@satsuki:~/coderepo/tester$

Our new changes have disappeared. How odd! Well actually not really. As we stated earlier, when we committed our .gitmodules file along with the tester directory, we not only committed the fact that we required a submodule, we also committed the exact point we wanted that submodule to point to. If we want to change this, then we must commit that as a change. It may seem a little odd that we have to jump through these hoops to get an update to an upstream project, but if you think about it, it actually makes a lot of sense. It means that anyone cloning our repository is sure to get a version of the submodule that we have decided is right for the project. So keeping this in mind, let us walk through a quick example of how we would finish the job and commit a new version of the submodule.
john@satsuki:~/coderepo$ cd tester/
john@satsuki:~/coderepo/tester$ git pull
You are not currently on a branch, so I cannot use any
'branch.<branchname>.merge' in your configuration file.
Please specify which remote branch you want to use on the command
line and try again (e.g. 'git pull <repository> <refspec>').
See git-pull(1) for details.
john@satsuki:~/coderepo/tester$

Interesting! What has happened here is that by performing the git submodule update command, we effectively asked Git to checkout a commit. Remember in the past we talked about detached HEAD? This is exactly what Git has done. A submodule spends most of it's life in a detached HEAD state. As we tell Git that we must have the submodule at a specific commit, it means that Git checks out a commit, rather than a branch. If you think about it, this makes sense, we do not want the contents of the module changing.

So to bring our module up to date, we need to first checkout master. Then we can issue our git pull.
john@satsuki:~/coderepo/tester$ git checkout master
Previous HEAD position was 590e0eb... Work on tester nf3
Switched to branch 'master'
john@satsuki:~/coderepo/tester$ git pull
Already up-to-date.

Oh? Should we not have seen some commits pulled in here? Actually, no. We pulled the changes into master earlier, when we ran the git pull. When the module reverted to the earlier commit, 590e0eb, it did not affect the master branch at all, as we simply checked out a single commit. So by swiching to master, we have already altered the contents of the submodule directory, as can be seen below.
john@satsuki:~/coderepo/tester$ ls
newfile1 newfile2 newfile3 newfile4 test.sh
john@satsuki:~/coderepo/tester$ cd ..
john@satsuki:~/coderepo$ git status
# On branch master
# Changes not staged for commit:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: tester (new commits)
#
no changes added to commit (use "git add" and/or "git commit -a")
john@satsuki:~/coderepo$

All we need to do now is to commit the submodule changes into the repository and check that the update yields the new file.
john@satsuki:~/coderepo$ git commit -a -m 'Up revd upstream module'
[master 022a163] Up revd upstream module
1 files changed, 1 insertions(+), 1 deletions(-)
john@satsuki:~/coderepo$ git submodule update
john@satsuki:~/coderepo$ cd tester/
john@satsuki:~/coderepo/tester$ ls
newfile1 newfile2 newfile3 newfile4 test.sh
john@satsuki:~/coderepo/tester$ cd ..
john@satsuki:~/coderepo$

As you can see, submodules can be rather useful. You can even make changes to the repository in the submodule and commit them locally to perhaps keep changes that you want to make to the submodule. As this is a Git repository in its own right, you can merge upstream changes in too! Remember though that if you did make changes, and you committed them to the submodule, if you then issued a git submodule update without first committing your changes in the superproject, your commit would be lost. Of course nothing in Git is ever really lost, but it would be prudent of you to always keep changes you make to submodules in a branch, that way they are easy to bring back if you make a mistake like the one described.

With that all said and done, we have finished our tour of the major portions of Git. What follows in the next chapter are some other points that are added more for information on what can be done with Git.

Previous Day

Next Day

 
   
home | download | read now | source | feedback | legal stuff