Week 1

"History Lesson"

A Brief History of Version Control

The Very Early Days

Version control systems have been around for forty years (2011 at the time of writing). During this time they have undergone an intense amount of change and have evolved into some of the most incredibly powerful tools utilised in software development today. Chances are that in the early days you will have started off storing different versions of your source code and documents in separate files and folders. You may have even archived them off to compressed storage files, like zip or tar. Rest assured, you are not the first person to do this, and in 1972, someone called Marc J. Rochkind, decided to create system for storing revisions of documents and source code.

The system Marc created was called SCCS and stood for Source Code Control System, in essence probably the most apt description for what we mainly use a version control system for today. SCCS was originally written for an operating system called OS/360 MVT and was later ported to C, and was used as the most dominant version control system for UNIX, until ten years later, when RCS was introduced.

Time To Move On

In 1982, Walter F. Tichy released RCS, standing for Revision Control System. It was intended to be a free and offer more functionality than SCCS. RCS is still being maintained, as part of the GNU project, and at the time of writing is about to have its first new release, version 5.8, in over fifteen years.

However, RCS, like its predecessor SCCS, has no way of dealing with groups of files. Essentially, each file has its own repository which is stored near to the file under a different name. Whilst rather advanced, with primitive forms of branching, the interface, commands and version numbering have been described by some as rather cumbersome. Enter some successors.

CVS (Concurrent Versions System) was created in 1986, and began life as a set of shell scripts to operate on multiple files, using RCS to perform the actual repository management. As development continued, this way of working was dropped and CVS began operating on files itself, evolving into a version control system in its own right. The current iteration of CVS was released in 1989 and on November 1 1990, version 1.0 was released to the Free Software Foundation for distribution.

CVS did not version file renames or moves at all as at the time, re-factoring - a process of modifying code to improve some non-functional attributes of the software, was often avoided and so the feature was not required. CVS also did not support atomic commits. An atomic commit is used by more modern version control systems to safe guard the database. In essence atomic committing is the act of applying multiple changes in a single operation. If any of the changes do not apply correctly, all others are reverted and the commit is aborted. When designing CVS this was not seen as an obstacle, as it was thought by the developers that a server and network should have enough resilience that it would never crash whilst committing.

Whilst active development of CVS has apparently ceased, as of May 2008, it is worth taking note that CVS defined the model for branching that was included and refined in almost all version control systems since.

Offering Commercial Support

Now that version control was advanced enough and people had begun to rely on VCSs in general, commercial offerings began to spring up. Three prominent systems that were released within a short time of each other; ClearCase, VSS and Perforce. All three of these are proprietary systems which were developed and filled a gap in the market for commercially supported systems.

VSS, originally developed by One Tree Software for several platforms, was continually developed by Microsoft, who bought One Tree Software in 1994, with the one caveat that Microsoft ceased development of all VSS on all platforms other than Windows. VSS integrated into Visual Studio, Microsoft's Integrated Development Environment. VSS has now ceased development, but ClearCase, now developed by a division of IBM, and Perforce are still being actively developed and maintained.

The Millennium

The millennium brought with it a new breed of version control systems. Subversion, or SVN as it is colloquially known as, was developed primarily to be a replacement and mostly compatible successor to CVS. SVN was first released in 2000 and by 2001 was able to sufficiently host its own source code due to its own advancement. In November 2009 Subversion was accepted into the Apache group and is currently developed and maintained by its community and by several commercial entities.

Subversion brought things to the table that previous version control systems did not. As it was released as free software, in the same vein as CVS, it was widely adopted by the open source community and later into commercial environments for its vastly improved feature set. For a start SVN offers true atomic commits. This gave it a definite advantage over CVS as it was seen as a truly robust alternative.

It also brought in features like the tracking of files through renames and moves, including their entire version history and the versioning of symbolic links. SVN moved with the times and introduced many other sought after features, such as HTTP serving, cheaper branching, efficient network operation and native support for binary files.

As with all version control systems, there are aspects that people dislike. In Subversion, people often find the implementation of tags -- that is names that point to specific points in the history of a repository, an issue. In SVN, a tag is actually a branch. What makes this different to other systems such as Git and its predecessor CVS, which literally point to a specific commit in the tag, SVN actually creates a snapshot of the file system. Although it employs relatively cheap branching which is lightweight on the repository, the tagging model used can be incredibly heavyweight on the client.

Another issue with the tagging model in SVN is that it holds no history information. This makes it impossible, for example, to take two tags and try to find out all logged commits that occurred from one to the other. This is the difference between using a copy as a tag, and implementing a reference. Tags should also be read-only implicitly by their very nature, they should refer to a point in history. However as tags are implemented as branches in SVN this is not the case.

Introducing the Linus Factor

The Linux kernel was at one point maintained under a source control system called BitKeeper. The decision, in 2002, to use BitKeeper for the management of the Linux kernel source was rather controversial, the main opposition being that BitKeeper was a proprietary system that was offered by BitMover. At the time, BitMover offered certain open source projects the opportunity to use BitKeeper at no cost, so long as the community developers did not engage in the creation of a competing tool.

In April 2005, BitKeeper withdrew the free license that it had granted to the open source communities, after allegations of reverse engineering by some parties on an unrelated project. Due to the way BitKeeper worked, and decisions made regarding licensing it became impossible for several key developers, and according to some reports including Linus himself, to actually own even a commercial version of BitKeeper.

It was due to these circumstances, that Linus Torvalds himself, decided to begin writing his own version control system that would enable him to have all of the features that he had had available to him with BitKeeper, the most important of these seemingly being a distributed environment. Linus decided on a set of criteria, which along with a distributed environment, also included a robust safeguard against corruption, be it accidental, or malicious, and very high performance.

Git development began on the 3rd of April 2005, and by April the 7th, the project had been announced and was already able to host itself. On June 16th, the release of the Linux kernel version 2.6.12 was managed by Git. Junio Hamano, who had been a major contributor to the project, took over maintenance of Git in July, and by December had released version 1.0 to the community.

Interestingly enough, another project was also created as a result of this chain of events. Mercurial or Hg as it is often known, is reported to have begun development on April 19th of the same year and was started by Matt Mackall with largely the same goals as Git. Though Git was chosen to be the version control system used by the Linux kernel, Mercurial is actively used by many other projects and shares a very similar design and concept to Git.

Design Changes

Git has some features which should be discussed here as many of them are different to every other version control system. Perhaps one of the most important of these is the very strong emphasis on non-linear development. Git provides many tools with which to work with many branches and merges, with a core principal being that changes will often be passed around more than they will be written.

Git is very fast. Though for certain operations it may be slower than some of its peers, Git has been consistently proven to be faster than most. This design implementation was essential as the Linux kernel is indeed a very large project.

When developing systems that offer any kind of security to an end user, it is essential to provide a way of auditing the history of the code, to ensure that no tampering has taken place. Due to the way that Git is implemented, each SHA-1 hash, used to identify a particular commit, depends on the entire history of the repository. What does this mean to someone viewing the repository? Once the repository is published, it is impossible for someone to tamper with the history without someone noticing. This is called cryptographic authentication of history.

Of course the most important feature, and one which will be discussed in great detail later in the book, is that of distributed development. The emphasis on non-linear development and the implementation of very cheap and fast branching, makes Git one of the best version control systems on the market for distributed development.

As mentioned previously every version control system has advantages and disadvantages and it would not be fair to make out that Git was without flaws. Some that have been mentioned by people over the years are its particular steep learning curve for basic understanding. Whilst it is agreeable to some degree, it is largely due to the fact that Git is a distributed version control system and inherently these systems are more complex in their implementation that others. Another bug bear of some people, is the fact that Git will not track empty directories.

Wrapping Up

Though there are many other version control systems out there that are being actively developed, such as Bazaar, Plastic and Darcs, to name but a few, we are going to end our historical tale here and continue with learning more about the Git version control system. There is a plethora of information available on the Internet about version control, so if you want to find more information about any of the systems mentioned here, that would probably be the best place to start.

Previous Day

Next Day

 
   
home | download | read now | source | feedback | legal stuff