Week 8

Day 4 - "Let's make a library"

Splitting the atom

Sometimes, after a project has been running for a while certain components actually grow rather useful. When this happens, people often want to move it outside of the original project and maintain it as a separate library. Of course the easiest way to do this is to just copy and paste the files out of the main project and into a subdirectory. In doing this we would lose or disconnect all of the development history of that subproject up to this point.

Using the git filter-branch we can actually pull out a folder and retain all of its history. The methodology behind this is that we rewrite the history to a new branch, but we only pull across changes to a particular folder and we store those in the root of the branch. Let us see how this works with a quick example. Remember we created the tester folder? We are going to make a few commits to the files in this folder to give it some history.
john@satsuki:~/coderepo$ echo "More development work" >> tester/newfile1
john@satsuki:~/coderepo$ git commit -a -m 'Work on tester nf1'
[master 1a4956b] Work on tester nf1
1 files changed, 1 insertions(+), 0 deletions(-)
john@satsuki:~/coderepo$ echo "More dev work" >> tester/newfile2
john@satsuki:~/coderepo$ git commit -a -m 'Work on tester nf2'
[master 7156104] Work on tester nf2
1 files changed, 1 insertions(+), 0 deletions(-)
john@satsuki:~/coderepo$ echo "Even more dev work" >> tester/newfile3
john@satsuki:~/coderepo$ git commit -a -m 'Work on tester nf3'
[master 1433223] Work on tester nf3
1 files changed, 1 insertions(+), 0 deletions(-)
john@satsuki:~/coderepo$

Now we are going to split that off into a separate branch which we will then clone into a new Git repository. After we have copied the history of the tester folder to a new branch, see if you can run through in your head, the steps we would need to take to pull this branch into a new repository.
john@satsuki:~/coderepo$ git checkout -b tester_split
Switched to a new branch 'tester_split'
john@satsuki:~/coderepo$ git filter-branch --subdirectory-filter tester
Rewrite 1433223d9c8a8abc35410d12cf78128c318b6e42 (4/4)
Ref 'refs/heads/tester_split' was rewritten
john@satsuki:~/coderepo$ git branch
develop
master
* tester_split
wonderful
zaney
john@satsuki:~/coderepo$ ls
newfile1 newfile2 newfile3 test.sh
john@satsuki:~/coderepo$ git checkout master
Switched to branch 'master'
john@satsuki:~/coderepo$ ls
another_file cont_dev tester
john@satsuki:~/coderepo$

So now the directory has been split away from the original source code into a new branch. Have a think about what steps you would take to bring this into an entirely new repository.

Note - More backups

Git likes to make things easy for you. You may not have noticed it before, but when using the git filter-branch tool to rewrite a branch, Git keeps a backup of the value of HEAD before you started rewriting your branch. This backup is kept in refs/original/refs/heads/<branch_name>. This file will contain a commit ID which we can use to revert our branch back to its original state, if the filter does horribly wrong.

In the trenches...
"So John, I managed to split the Atom library out into a new branch like you said, but I have no idea how to pull this into a new repo." Jack was finally feeling like he had gotten to grips with Git, but his latest task had left him feeling a little dejected. He idly stabbed at his leg with a pen whilst waiting for John to finish his tapping away.

John lifted his keys from the keyboard and turned his chair. "You really can't think of a way to copy what we have in one repo into another?"

Suddenly it was like a light bulb had exploded with light inside Jack's skull. "CLONES!" he shouted.

We actually have at least four methods we can use to do this.
  1. Copy the data from one repo to another with a simple copy and paste
  2. Clone our repository, delete all of the branches other than tester_split and then rename it to master
  3. Initialise a new repository, setup a remote to the original and then fetch our tester_split branch
  4. Create a bundle of the tester_split and then clone from the bundle into a new repository

The first of these will leave us with no history of development at all, so let us ignore it, as it is not what we require. The second of these is trivial and should require no explanation at all. We simply clone and then using the usual tools, we delete all unnecessary branches. However this first method does have its disadvantages, namely the fact that when we clone the repository, we take every single object from the source repository into the new one. Whilst this is generally not a problem it would mean that we would have to run some fairly aggressive garbage collection to remove all of these unwanted objects. This would happen natually over time as the objects aged and were no longer referenced, but it would result in a repository that was initially much larger than it needed to be.

The other two methods deserve a little more consideration as they both perform much better in this respect. The third method you should be familiar enough with previous material to be able to perform right now. However, using the fetch command as we have done so before would again pull in many more objects than we require. As such we are going to do a subtle twist to this command in the following output.
john@satsuki:~/coderepo$ cd ../
john@satsuki:~$ mkdir subrepo
john@satsuki:~$ cd subrepo/
john@satsuki:~/subrepo$ git init
Initialized empty Git repository in /home/john/subrepo/.git/
john@satsuki:~/subrepo$ git remote add source /home/john/coderepo
john@satsuki:~/subrepo$ git fetch source +tester_split:master
fatal: Refusing to fetch into current branch refs/heads/master of non-bare repository
john@satsuki:~/subrepo$ fatal: The remote end hung up unexpectedly
john@satsuki:~/subrepo$

What we have asked Git to do is to pull only the branch tester_split from the remote we called source and place it into master locally. Think of the +<branch>:<branch> as +<source>:<destination> and all will make sense. As you can see Git is not too happy about our intentions here as it does not like overwriting the master branch of a non-bare repository. That is OK, we have another way around this.
john@satsuki:~/subrepo$ git fetch source +tester_split:tmp
remote: Counting objects: 15, done.
remote: Compressing objects: 100remote: Total 15 (delta 3), reused 0 (delta 0)
Unpacking objects: 100From /home/john/coderepo
* [new branch] tester_split -> tmp
john@satsuki:~/subrepo$ git branch -m tmp master
john@satsuki:~/subrepo$

So we have almost deceived Git a little here, but I think we can live with ourselves. By first pulling the branch into a tmp branch, we were then allowed to rename it as master. Notice the number of objects required for this branch 15. If you remember when we cloned our repository a few weeks ago, this value was a lot higher than this. It was the subtle +<source>:<destination> which prevented us from pulling every last object from the source repository into our new slim sub-repository.
john@satsuki:~/subrepo$ ls
john@satsuki:~/subrepo$ git checkout master
Already on 'master'
john@satsuki:~/subrepo$ ls
newfile1 newfile2 newfile3 test.sh
john@satsuki:~/subrepo$

Notice that there are no files in the repository until we have checked out. This is because all the fetch did was to fetch the objects and place them in the repository object directory. It did not place anything in the working directory. If you remember this is same behaviour we saw with fetching before. So now we have a complete copy of our tester component of our repository from the source into a new repository. If we do a git log, we can see the history of the development.
john@satsuki:~/subrepo$ git log --format=oneline
590e0eb79bc5ba0bc09f611392e643f676b00a04 Work on tester nf3
785b86d877d2a5c0679d98181a23d06ed2ba7652 Work on tester nf2
1ff89f787438f081a0d74de2d26eb2d831c9c738 Work on tester nf1
a5a0d9762dd4b50d8f3228e37b315f6056d5a034 Moved testing suite
john@satsuki:~/subrepo$

Unfortunately since some of our development work on these files happened outside of this directory, this was lost when splitting and this is something to keep in mind should you ever perform this kind of operation.

Little bundles of joy

Git has so many ways to do things. This is in part what makes it a little daunting for those just starting but after you have gained a little experience, you begin to understand just what is happening in the background. When this realisation hits, you are able to almost immediately think of at least two different ways of performing the same thing. There have been numerous examples throughout the book, where there have been multiple ways to complete the same task. Here we are going to look at just one more way that we can create a new repo from our tester_split branch.

The tool we are going to introduce here is git bundle. The bundle utility allows us to export a set of revisions and archive them to a file. This file then becomes a resource that can be updated and pulled or fetched from. This is especially useful if you have no physical connection between two computers and wish to sync some of the data from one to the other. Let us take a quick look at how we could use the bundle tool in this case.
john@satsuki:~/coderepo$ git bundle create ../tester.bundle tester_split
Counting objects: 15, done.
Compressing objects: 100Writing objects: 100Total 15 (delta 3), reused 0 (delta 0)
john@satsuki:~/coderepo$ cd ..
john@satsuki:~$ git clone tester.bundle subrepo-b
Cloning into subrepo-b...
warning: remote HEAD refers to nonexistent ref, unable to checkout.

john@satsuki:~$

The syntax is fairly simple. The word create is used to tell Git to create a new bundle. After this we specify a filename and then the tip of the branch that we want to archive. However, as can be seen above, there is a problem. When we created the bundle, the branch which was checked out at the time was master. The objects we pulled from the source repository and placed in the bundle were all from the tester_split branch. As such the HEAD of the working tree at the time of the bundle creation, pointed to an object in the master branch. Obviously this object does not exist in our bundle and so Git complains. If we had checked out tester_split before creating the bundle, there would have been no complaints.

So all we have to do is to remap the HEAD of master to that of the HEAD of tester_split. As you can see below, it seems as if there are no branches at all and when we try to checkout master it does not exist. What actually happened is that the objects were cloned into the repository, but as the object that the source HEAD pointed to was unavailable, no branch was created. With a little git reset trickery, we can create our master branch in our new repository.
john@satsuki:~$ cd subrepo-b/
john@satsuki:~/subrepo-b$ git branch
john@satsuki:~/subrepo-b$ git checkout master
error: pathspec 'master' did not match any file(s) known to git.
john@satsuki:~/subrepo-b$ git reset --hard origin/tester_split
HEAD is now at 590e0eb Work on tester nf3
john@satsuki:~/subrepo-b$ git checkout master
Already on 'master'
john@satsuki:~/subrepo-b$ ls
newfile1 newfile2 newfile3 test.sh
john@satsuki:~/subrepo-b$

Now we have our repository complete as before and we have successfully reampped the master branch so that it points to origin/tester_split.
In the trenches...
Martha and John were sitting together in the office. The rest of the team had left hours ago and it was getting really late. Martha broke the silence, "So we've pulled the Atom library out," she giggled before continuing, "but how the heck do we put it back in again?"

"I'm really not sure said John," taking another swig of coffee before placing the mug back down on the desk. On the side was written the word GIT in large marker pen, a gift from Klaus.

Martha sighed. "It's getting pretty late John. I think I'm gonna head out."

"Yeh, I know what you mean," started John, "I think I'll get going too. Thanks for the help Martha."

"Anytime John."

Previous Day

Next Day

 
   
home | download | read now | source | feedback | legal stuff