Week 3

Day 3 - "What actually changed?"

Doing the diff dance

Knowing what the committer thinks they committed is brilliant. However, sometimes it's just not enough. The reason for this is stated fairly precisely in the first sentence of this paragraph, so let us add a little formatting to bring out the real meaning. Knowing what the committer thinks they committed is brilliant. By looking at the commit message we only know as much as the committer wants us to. If they are the helpful sort, this will probably be all that we need, most of the time. On the other hand there is always the situation where you'd like to know a little more about what was actually placed into the repository.

The git diff command can show us exactly that. For more information about diff in general, see the diff breakout box in this chapter. Think of a diff as an easy way of looking at the differences between two files, surrounded by a little context. This can often be enhanced by a visual diff viewer, but for now, let's stick with our simple text based git diff.

If we want to find out what the changes are between our current commit and one of the previous ones, we can write a command like the one below. Notice that below, 163f061 refers to the second commit that we made to the repository.
john@satsuki:~/coderepo$ git diff 163f061
diff --git a/my_second_committed_file b/my_second_committed_file
index 3ad4cc3..095b9cd 100644
--- a/my_second_committed_file
+++ b/my_second_committed_file
@@ -1 +1,2 @@
Change1
+Change2
john@satsuki:~/coderepo$

What this is telling us, is that between 163f061 and our current commit 9938a0c, we added the line Change2 to the file my_second_committed_file. We can see this by the preceding + on the line Change2. Let's make a few changes to our repository and see how the diffs look. We're actually going to make a few changes to the files using a text editor so that you can't see what we've done. Then, hopefully, when we run the git diff you'll be able to see clearly what has happened.
john@satsuki:~/coderepo$ git log HEAD~1..HEAD
commit a022d4d1edc69970b4e8b3fe1da3dccd943a55e4
Author: John Haskins <john.haskins@tamagoyakiinc.koala>
Date: Thu Mar 31 22:05:55 2011 +0100

Messed with a few files
john@satsuki:~/coderepo$

The command git log HEAD~1..HEAD tells Git to show us the git log for all commits between HEAD~1 and HEAD. The notation used here is something new to us, but seeing as HEAD points to the most current commit, HEAD~1 points to the commit previous to HEAD. This is how we tell Git to show us only the most recently commit.

As it turns out, John Haskins didn't really create a very meaningful log message. Messed with a few files is pretty unhelpful in the grand scheme of things. So let's be thankful that this isn't Tamagoyaki Inc's core repository and take a look at what actually happened in the commit a022d4d.
john@satsuki:~/coderepo$ git diff HEAD~1..HEAD
diff --git a/my_second_committed_file b/my_second_committed_file
index 095b9cd..c9887f8 100644
--- a/my_second_committed_file
+++ b/my_second_committed_file
@@ -1,2 +1 @@
-Change1
-Change2
+Changed this file completely
diff --git a/my_third_committed_file b/my_third_committed_file
new file mode 100644
index 0000000..5d27866
--- /dev/null
+++ b/my_third_committed_file
@@ -0,0 +1 @@
+Addition to the line
john@satsuki:~/coderepo$

As you can see, we have several things going on here, so let's take each of them in isolation and see what is going on. We are going to dissect the diff to see what each section means.
diff --git a/my_second_committed_file b/my_second_committed_file

This first line tells us that we are dealing with my_second_committed_file. This is showing that we are comparing the first revision, or a, against the second revision, b.
index 095b9cd..c9887f8 100644

This second line actually tells us the beginning of the object IDs, as they are stored in the repository. Note that these IDs are not the commit IDs, but the actual blob IDs that Git uses to refer to the file. For more information on this, checkout the Object's living in harmony breakout box.
--- a/my_second_committed_file
+++ b/my_second_committed_file

The next few lines are telling us which is the original file, and which is the new file, so we can use this as a reference.
@@ -1,2 +1 @@
-Change1
-Change2
+Changed this file completely

Now we see a group of lines which are generally referred to as a hunk. The hunk has two important pieces of information. Section -1,2 tells us that in the original file, we are looking at the original file (-), that the starting line where the change takes place is line 1 (1) and that the hunk applies to two lines (2). The next section tells us that in the new file, the change takes place as line 1, and because the comma and remaining number are omitted, we can infer that the hunk applies to only 1 line.

The three following lines show what change actually took place. Strings Change1 and Change2 were deleted from the file, whereas Changed this file completely was added to the file.

Looking at the next diff segment, we can see it applies to a different file. Essentially this hunk is no different to the last, the only interesting portion is shown below.
new file mode 100644
index 0000000..5d27866
--- /dev/null
+++ b/my_third_committed_file

This shows us that my_third_committed_file is actually a new file. Notice the /dev/null and the 0000000 object ID, indicating that there was no original file.

Diffing Over A Range

All the operations that we have performed so far have been on one commit. Whilst important and valuable, it may be that you want to see an entire range of changes.
In the trenches...
"I'm still not entirely convinced about this John," said Martha. "I've been playing around with Git, like you asked me, but it still just seems like we're replicating the work that we used to do with the readme changelogs and the tarball files."

She sat down on a near-by chair and wheeled it over to John's desk. She surveyed the desk for an inch of vacant real estate before finally resting her elbow on the corner of his desk next to a copy of Pro Git.

"Well, actually Martha, I can see exactly what you mean. Up until now, there is no difference between the old and the new process. I'm still in control of all the versions, so nothing has really changed." He thought long and hard, "Tell ya what. Why don't you give me an operation that you've always wanted to do against our code tree tarballs easily."

"Easy," she snapped back, "I want to know what changes were made for the last two weeks whilst I had been away on holiday." She smiled an almost mischievous smile as she referenced 'The Incident', as it had become known throughout the office.

"Easy," John quipped, mimicking her mannerisms. The two broke out in laughter. "We can use git log for that, and I think there are some date options too. Let me check the man page."

Looking at the man page for git log is a mind trip for the uninitiated. Weighing in at over 600 lines of text, it is abundantly clear that this tool does a whole lot more than viewing a simple history of commits to the repository. It is well worth taking the time to read through the current available options by typing man git log on the command line. If you have the documentation installed, this will yield the man page for git log.

Listing all commits in our repository is useful, being able to filter this output is fantastic. This is one area in which the developers of Git have placed a great deal of time and effort. For example, we can use git log to not only show us the commit message, but also provide a diff output as well. This means that for each commit entry in the output, we will see a diff as well. Now, whilst we are further empowered by having the diff output in chronological order for each commit, we can take things further by filtering the commits.

Suppose we want to view all the commits that we made in the last week, typing the following into the command line in our test repository yields the following result.
john@satsuki:~/coderepo$ git log -p --since="last week"
commit a022d4d1edc69970b4e8b3fe1da3dccd943a55e4
Author: John Haskins <john.haskins@tamagoyakiinc.koala>
Date: Thu Mar 31 22:05:55 2011 +0100

Messed with a few files

diff --git a/my_second_committed_file b/my_second_committed_file
index 095b9cd..c9887f8 100644
--- a/my_second_committed_file
+++ b/my_second_committed_file
@@ -1,2 +1 @@
-Change1
-Change2
+Changed this file completely
diff --git a/my_third_committed_file b/my_third_committed_file
new file mode 100644
index 0000000..5d27866
--- /dev/null
+++ b/my_third_committed_file
@@ -0,0 +1 @@
+Addition to the line
...
...

Notice we get to see the diff that was presented before when we ran our git diff HEAD~1..HEAD command, but this time, as we have used the git log command instead, we get to see the diff output as well. This is what the -p flag is for. Take note of the way we have specified the time period that we are interested in. The section --since="last week" tells Git to filter the output and show only the entries that were committed within the last week.

This type of filtering can be exceedingly useful to a developer. Often when problems arise, you do not have a defined point in time that you know when your code was last working. However most of the time, you can say with some certainty, "I know it was working two weeks ago". Using the methods described above, will give the user all of the changes, categorised by commit, that occurred in those two weeks, allowing them to narrow down the scope of exactly where to begin looking for the offending changes.

If the developer can further categorise the issue, such as, "I know which file the change must have occurred in", then the following example will demonstrate just how easy it is to filter the results even further. Even in the simplified example repository that we have been using, running this command filters the output to a single file.
john@satsuki:~/coderepo$ git log -p --since="last week" -- my_second_committed_file
commit a022d4d1edc69970b4e8b3fe1da3dccd943a55e4
Author: John Haskins <john.haskins@tamagoyakiinc.koala>
Date: Thu Mar 31 22:05:55 2011 +0100

Messed with a few files

diff --git a/my_second_committed_file b/my_second_committed_file
index 095b9cd..c9887f8 100644
--- a/my_second_committed_file
+++ b/my_second_committed_file
@@ -1,2 +1 @@
-Change1
-Change2
+Changed this file completely

commit 9938a0c30940dccaeddce4bb2eb151fba3a21ae5
Author: John Haskins <john.haskins@tamagoyakiinc.koala>
Date: Thu Mar 31 20:34:23 2011 +0100

Finished adding initial files

diff --git a/my_second_committed_file b/my_second_committed_file
index 3ad4cc3..095b9cd 100644
--- a/my_second_committed_file
+++ b/my_second_committed_file
@@ -1 +1,2 @@
Change1
+Change2

commit 163f06147a449e724d0cfd484c3334709e8e1fce
Author: John Haskins <john.haskins@tamagoyakiinc.koala>
Date: Thu Mar 31 20:32:59 2011 +0100

Made a few changes to first and second files

diff --git a/my_second_committed_file b/my_second_committed_file
new file mode 100644
index 0000000..3ad4cc3
--- /dev/null
+++ b/my_second_committed_file
@@ -0,0 +1 @@
+Change1
john@satsuki:~/coderepo$

See how easy that is. Note, the -- is necessary to tell Git the following string is a path. We no longer have the information for my_third_committed_file present in the output. We have filtered everything out but the information we are looking for. When you are up against deadlines, pouring through pages and pages of diffs and changes can be incredibly time consuming. Having the tools available to cut that output down to just the relevant material can be life saving.

Previous Day

Next Day

 
   
home | download | read now | source | feedback | legal stuff