10 Sep 2009, 11:07 p.m.

Git for Subversion Users

As readers may have gathered from previous blog posts, I'm something of a fan of version control and of Subversion in particular.

In recent months it has become increasingly difficult to miss the buzz surrounding Git, a version control system originally developed by Linus Torvalds to aid development work on the Linux kernel. Git is gaining a lot of ground, with a number of major projects (for example Perl, Samba and Wine) having been migrated to it.

So here's a bit of an introduction to Git from a Subversion (and CVS) user's point of view. The post won't be particularly detailed or comprehensive, as there is plenty of documentation on the web, but should hopefully provide some insight into what looks to be a very promising tool which takes an interesting approach to version control.

Distributed Version Control

The main paradigm shift that Subversion and CVS users will come across is that Git is what's known as a distributed version control system. This means that there is no central repository for a project which is in Git. Another way to look at it is that every single checkout/working copy - known as a "clone" - of the project is actually the repository itself, with its full history and everything you need to manage it.

Another unfamiliar aspect is that every working copy is effectively a branch, and commits to it are isolated from any kind of notional trunk, or indeed anybody else's clone, until such time as you manually merge two or more clones.

So what are the potential wins gained by taking a distributed approach to VC?

  • If you travel a lot, or otherwise regularly find yourself in situations where you don't have easy, reliable internet access, you can continue to work and make commits without network access.
  • Thus, a developer is not directly dependent on an external server, which greatly reduces the risk of slow response times and server outages preventing the developer from working and committing changes.
  • While Git is designed primarily for collaborative development, dispensing with the requirement for a central repository makes Git an appealing option for individual developers or small teams who don't have the time or inclination to install and maintain a version control repository and server.
  • A user doesn't require privileged access to the repository in order to keep a full history of their own changes.

To elaborate on that last point: if you take an SVN checkout of, say, the Linux kernel source code, you're free to play with it, and make any changes you like, but you can't commit those changes. Thus a lot of the benefits of version control - history, finding diffs etc - are not automatically available to you.

With Git, your clone is entirely your own so you can just crack on with coding and committing. If you feel your improvements are ready for public consumption, there will always be ways to contribute them back to a project. Similarly, if things don't work out the way you planned, the entire clone can be discarded, and no one else will be any the wiser.

Git

Another interesting aspect of Git itself is that it is designed to be very fast indeed, and by all accounts the developers have pulled it off. The Linux kernel source consists of tens of thousands of files, so Git simply has to be blazingly fast for the developers to remain productive. If you're finding that your project is large enough that Subversion operations are becoming uncomfortably slow, Git might be worth a look on that basis.

Git is also optimised for merging. I guess since a Git checkout is inherently a branch, and without a centralised, canonical copy of the project, the number of merges needed will be greatly increased, and so the developers have explicitly made ease of merging a design goal.

Apart from the extra merging, most steps in a day-to-day Git workflow will be quite familiar to a Subversion user, though the process can be a shade more long-winded. It might be worth running through an example, therefore. We'll use the little "Hello World" project that the Git folks thoughtfully provide for us to clone and play with.

Getting hold of that project is done using the git clone command:

[simon@vps02 gittest]$ ls
[simon@vps02 gittest]$ git clone git://github.com/git/hello-world.git
Initialized empty Git repository in /home/simon/gittest/hello-world/.git/
remote: Counting objects: 158, done.
remote: Compressing objects: 100% (79/79), done.
remote: Total 158 (delta 54), reused 157 (delta 54)
Receiving objects: 100% (158/158), 15.62 KiB, done.
Resolving deltas: 100% (54/54), done.
[simon@vps02 gittest]$ ls
hello-world

So the hello-world directory has been checked out. Let's imagine we've moved into that directory and edited a file, php.php, to make the code a bit nicer:

[simon@vps02 hello-world]$ git status
# On branch master
# Changed but not updated:
#   (use "git add ..." to update what will be committed)
#
#       modified:   php.php
#
no changes added to commit (use "git add" and/or "git commit -a")
[simon@vps02 hello-world]$ git diff
diff --git a/php.php b/php.php
index 02e264e..3703333 100644
--- a/php.php
+++ b/php.php
@@ -1,3 +1,8 @@
 <?php
-       print("Hello World");
-?>
+/**
+ * Simple PHP "Hello World" program
+ * @package HelloWorld
+ */
+
+echo  'Hello World';
+

There's an extra step required before you can commit the files: you have to "add" them to the commit, which is subtly different to svn add in that it applies to existing, modified files as well as newly-created ones. Then you're clear to commit:

[simon@vps02 hello-world]$ git add php.php
[simon@vps02 hello-world]$ git commit -m 'Clean up the code'
Created commit 9aa8df6: Clean up the code
 1 files changed, 6 insertions(+), 2 deletions(-)

That's almost all there is to it. Of course, all that's happened so far is that you've committed your changes to your local clone. Merging is done via the various options to the git merge command, which will be reasonably intuitive to Subversion users.

A related option which I was impressed by was git format-patch which creates a .patch file containing a diff of your changes, formatted as an email message which can be sent using git send-email. That could be a really convenient way to contribute changes to a project where you don't have direct commit privileges.

Export/Import from Subversion

Should you decide to take the plunge and migrate to Git, it's surprisingly easy. There's a git svn command which allows you to take a clone:

[simon@vps02 gittest]$ git svn clone svn://svn.example.org/hellosite/ svn_copy/
Initialized empty Git repository in .git/
W: Ignoring error from SVN, path probably does not exist: (160013): Filesystem has no item: File not found: revision 100, path '/hellosite'
W: Do not be alarmed at the above message git-svn is just searching aggressively for old history.
This may take a while on large repositories
r374 = e07d89a8b3102b91c00103a6f84d0ce3391a6530 (git-svn)
        A       hellosite/hello.jpg
        A       hellosite/index.html
r375 = 4352300c26ad5c221ff9835f2cc144d552899d5c (git-svn)
        M       hellosite/index.html
## many lines of output snipped
Checked out HEAD:
  svn://svn.example.org/hellosite/ r564

That should work in most cases, but for more complicated repository structures there's also a nifty script named svn2git which is designed to make sure that all tags and branches get copied across correctly.

What's really nice is that if you can't or don't want to migrate a project away from Subversion, you can still take advantage of Git's offline commit functionality. Having taken a local Git clone of an SVN project as above, and made some changes over several commits, the entire history can be pushed back to Subversion like so:

[simon@vps02 svn_copy]$ git svn dcommit
Committing to svn://svn.example.org/hellosite ...
        M       index.html
Committed r580
        M       index.html
r580 = c21a9ccb6fbe5ddecc3969da03ddeb23bf66cfe4 (git-svn)
No changes between current HEAD and refs/remotes/git-svn
Resetting to the latest refs/remotes/git-svn
index.html: needs update
        M       index.html
Committed r581
        M       index.html
r581 = 2e2ac5f5e0333ef6636d7280fc66e153b7de9196 (git-svn)
No changes between current HEAD and refs/remotes/git-svn
Resetting to the latest refs/remotes/git-svn

What's absolutely wonderful is that Git creates an SVN revision for each commit, just as though you had made the commits directly in SVN. This is a truly compelling feature.

Conclusions

So Git is certainly interesting. I'm left with the question "will I use it?", and the answer is "probably, at some point". I'm keen to test-drive it on a smallish real project, should one come along, and see how it feels to live and work with it for a while. Whilst Git is clearly ready to play with the big boys, I do have some misgivings.

By making it so easy for individual developers to work for days or weeks at a time on a local copy of a project's code in arbitrary places, there's increased potential for work to get lost due to some disaster like a hard-drive failure or an absent-minded rm -rf *. At least with a centralised repository there's a single tranche of files that can be backed up regularly as part of normal operations.

I would also worry about "branching hell", whereby you end up with countless diverging copies of a project's code, and no idea how to reconcile them. This is something I'm continuously at pains to avoid in a professional environment, and the fact that Git effectively makes branching obligatory before a developer can even begin work does suggest that things could get out of hand quite quickly.

I also get a sense that, as a relatively young project, tool support for Git might be a little limited at this point in time. By which I mean that before adopting Git in anger I'd like to see seamless support for it in tools like Phing, and in the various Continuous Integration systems, and perhaps a nifty web interface along the lines of WebSVN. I imagine some of this stuff is probably out there, and support will in any case improve, but in general terms I like to see a very mature ecosystem around something as important as a version control system prior to adoption.

Related Links

Posted by Simon in Version Control
11 Sep 2009, 8:49 a.m.

Ciaran McNulty

It certainly looks pretty easy to start trying Git, and I can see how it could be useful.

At work we have a massive codebase in terms of numbers of files, so faster operations sound pretty compelling!

I guess it would all come down to how easily individual Git versions can be merged.

17 Dec 2009, 12:49 p.m.

Pablo

I've been using git to interact with svn for a while and it's pretty neat.

I have tested branching and merging and it works much better than subversion (1.4). Tool support is also quite complete, including web viewer and even a tortoise.

The main reason I'd switch to git is branch management and merging. We are currently working on 3 or 4 development branches at the same time (including release branch and trunk) and it becomes messy when you want to merge a development branch back to the trunk and vice-versa

My 2 cents

25 Mar 2011, 6:01 a.m.

Rajesh Manickadas

Excellent Post, Cleary brought out the advantages of using GIT. Thanks.

4 Jun 2013, 9:12 p.m.

max weber

problem with GIT is it can never resolve conflicts. Svn and CVS have always been good at automatically merging. Other source control systems (commercial ones) are bad at it. Git is pretty bad at it from my little experience. You get to do it manually like with the commercial products.