Week 7

Day 1 - "Networking with a difference"

Pushing across a LAN

Now we have a complete copy of our repository in another location. At the moment we have created this clone on the same machine that our original is. This isn't really a very good idea for backup purposes. Git supplies several means with which to talk to a remote machine, but by far the most common of these is to utilise the SSH protocol. SSH is a secure, encrypted way to communicate with a remote repository. Which is a must for pushing to an important repository that people are going to pull information from.

If we assume that for a moment that our user john has now moved to another machine and now wishes to clone a repository that he had on his original machine to this new one. The commands are identical to that which we used before. We are going to assume that john already has SSH access to the machine. In this way, we can issue the commands as follows.

john@akira:~$ git clone ssh://john@satsuki/home/john/coderepo coderepo-ne

Initialized empty Git repository in /home/john/coderepo-ne/.git/

john@akira's password:

remote: Counting objects: 53, done.

remote: Compressing objects: 100Receiving objects: 100Resolving deltas: 100remote: Total 53 (delta 10), reused 0 (delta 0)

john@akira:~$

Now we have done exactly as before and cloned our repository to a local folder called coderepo-ne from the remote URL ssh://john@satsuki/home/john/coderepo. Notice the use of ssh:// to denote the specification of the SSH protocol. We have also put the users name in the URL of the remote path. If the SSH server was running on a different port to usual, that is, on a port other than 22, we could have also added a port number preceded by a colon after the hostname.

SSH isn't the only protocol that Git can use. We have already looked at two; local and SSH. In fact, Git supports a further two protocols and these are HTTP/S and Git's own GIT protocol. We are going to take a quick look at the Git protocol next, before moving on to HTTP/S.

The Git Protocol

The GIT protocol is the fastest transfer protocol out there for Git repositories. This should come as no surprise, since it was developed exclusively for use within a Git environment. It does however have a relatively large drawback. The drawback is that it provides absolutely no authentication. For this reason, enabling the GIT protocol on a repository and running the server back-end, (described later), will allow anyone who can talk to the servers port complete read access to the repository.

If you are serving a large repository on the Internet for example, this could actually be rather beneficial and will allow you to serve pulls quickly and efficiently. However, though it is possible to enable pushing using the GIT protocol, the lack of security would mean that anyone who could see the server and connect to the port, usually 9418, could make changes to the repository. This is usually entirely undesirable and as such people will often couple a read-only GIT protocol with a writable SSH protocol for the developers that need push access.

The HTTP/S Protocol

Just as with the GIT protocol, Git can support the HTTP/HTTPS protocol as well. Setting up this is usually as simple as creating a bare clone of your repository, keeping it up to date, usually via a post-update-hook, which is described later in the book, and simply allowing clients access to this server area.

Note that the above is only to provide read-only access over HTTP. It is possible to allow write access, i.e. pushing, over HTTPS, but this is more complicated to set up and is outside the scope of this book.

Protocol decision

Tamagoyaki are about to embark on their decision making process regarding which protocols to use and how to perform their collaboration between themselves and their external partners. They are going to have to take multiple things into consideration, such as security, speed, administration and storage. When you begin to implement the Git system yourself, you too will have to think about these decisions and answer questions like:

Who is going to require access to the repository?
How many people are going to require access to the repository?
Is the information sensitive, either from an IP perspective or from a customer point of view?
How large is the data that we are hosting?
How large is the change set?
Do we need a QA area?
Do we need a Production area?

This is just a short list of the questions that you will need to consider when implementing a full on Git environment. The beauty of the Git system though, is that it is flexible and very difficult to box yourself into a corner, where a decision made early on prohibits a different approach later on.