Mercurial Upgrade

April 4, 2007

Say hello to Mercurial, my long-overdue replacement for CVS. Unlike CVS and Subversion, Mercurial is a distributed version control system (VCS), which means (among other things) it doesn't have a central repository, has disconnected (non-networked) commits, and allows you to group small changes together as "change sets". Other well-known distributed VCSs include Bitkeeper, Git, Darcs, and Monotone (there are more). While searching for a CVS replacement, I spent some time using Subversion, Monotone, and Git; here's a brief overview of my experience with each one.

  • Subversion: Subversion is probably the most popular VCS, so you're probably already familiar with it. I'll dispense with the pleasantries and skip straight to the problems.

    For the past several months I've been using a private, home Subversion repository for small projects, snippets of code, configuration files, scripts, and various other knick-knacks. Along the way, I noticed several things about Subversion that bother me. For example, until version 1.4, common operations like svn status and svn commit were uncomfortably slow under Subversion. They're better now, but still not as fast as I'd like. Copying and moving large groups of files is still painfully slow (moving several hundred megabytes of files took me well over 20 minutes).

    Branching in Subversion is primitive (and slow, since it's really just a copy). For me this is a major problem, because in addition to revisions, I also want to use branches for quick, version controlled staging areas for new features. That's a problem in Subversion, because branches are expensive, and merging is kind of wimpy.

    It's a genuine hassle to require network access for commits; I regularly work remotely, and even though I have VPN access (courtesy of OpenVPN) it's still kind of distracting to wait for common commands like commit, add, and copy . My alternatives? Move the repository to a public server with better bandwidth (which makes it slower for me to access while I'm at home, plus it's not really private any more and I'm still dependent on network connection) or hold off on commits until I'm at home (which is contrary to committing in small, incremental changes, my preferred modus operandi).

    Finally, and most importantly, Subversion is centralized. Why? It imposes all sorts of workflow restrictions that haven't been necessary since VHS tapes went out of style. For example, I have roughly five gadzillion projects in various states of brokenness and disarray that I'm just not ready to publish. Distributed version control systems have no central repository except one that is designated by convention, so I can commit locally, push to my private repository when it's convenient, and publish to the public repository when I'm damn good and ready. Subversion can't do any of this without cheating, of course, so I'm forced to either migrate projects to the public repository without their history or use svndumpfilter chicanery to bludgeon Subversion into doing something it should be able to do out of the box. Which sounds an awful lot like trying to copy and move files in CVS. Which is why we were supposed to upgrade to Subversion in the first place. Oops...

    (Subversion isn't all bad, by the way. It's certainly a huge improvement over CVS. It integrates well with Rails, Eclipse, and all the other fancy toys kids use these days. Plus Subversion has all sorts of nifty extensions like TortoiseSVN and Trac. I use Subversion daily at work. I just need things Subversion doesn't support by design).

    I'd be derelect in my blog posting responsibilities if I didn't mention SVK, a distributed VCS built on Subversion that supports repository mirroring and disconnected operation. I can't say much about SVK because I don't have much experience with it, although I'm fairly sure SVK has neither the speed nor the power of Mercurial and Git. Personally, I don't really see the point of keeping Subversion around for the sake of keeping Subversion around, particularly in lieu of Subversion's marvelously atrocious track record with repository corruption.

  • Monotone: I wanted to like Monotone. I stumbled across a reference to it in the SQLite documentation, and spent several months putting up with Monotone's warts after Linus plugged it on the LKML. I like the extensive documentation, simple command-line interface, Lua hooks, proper Windows support. Internally, Monotone makes extensive use of strong cryptographic primitives, which I wholeheartedly support.

    Unfortunately, Monotone is slow. Dog slow. An initial repository pull (checkout, in CVS parlance) is so slow that a many Monotone users provide a publicly downloadable snapshot of the initial pull instead. The last time I used Monotone, the crypto certs were their own special blend; I'd prefer either OpenPGP or X.509.

    (Oddly enough, my first look at Mercurial was right after I started testing Monotone. I wasn't initially interested in Mercurial because I was still stuck on Monotone. I didn't feel like Mercurial offered much more than Monotone, and I hadn't fully appreciated the speed difference between the two).

  • Git: Git was (is) right at the top of the list. It's fast, possibly (probably?) even faster than Mercurial. It has features Mercurial doesn't support (rebase, for example, although I believe that can be clumsily emulated with bundle and unbundle). Keith Packard wrote a post titled "Repository Formats Matter", advocating Git for X.org. His post briefly mentions Mercurial and in a positive light, but dismisses it prematurely for what I think is a completely asinine reason; old, obscure ftruncate() bugs in the Linux kernel (see this post on the Mercurial mailing list for a more thorough rebuttal of Keith Packard's ftruncate() sillyness).

    I only have two real gripes with Git: the Windows support sucks (it half-works via cygwin, which doesn't really count), and the command-line interface makes me feel stupid.

    The second one is the deal-breaker for me. While I may not be Mensa material, I've spent enough time using version control that I feel like I should be able get at least the gist of a new VCS in a couple of minutes, be comfortable with it within a day or so, and proficienct with it within about a week.

    I don't really think that's unreasonable. Even if it is, so what? A VCS is a tool, one that's supposed to make my life easier. If I can't use it without consulting the documentation every couple of minutes then it's just getting in my way.

    I simply refuse to waste my time learning the nuances of an interface that is complex for no other reason than the programmer couldn't see far enough past their own idiosyncratic whims to long enough provide an interface without the learning curve of a black diamond ski slope. This is particularly true for an application like Git that has few, if any, tangible benefits when compared to it's more intuitive counterparts.

The silver lining here is that I eventually stumbled on Mercurial. And by stumbled, I mean Richard (richlowe) told me about it (just like he told me about Vim, Screen, Mutt, Ruby, and a whole lot of other cool stuff I use regularly). He knows a lot more about version control software than I do, but I didn't really pay any attention. At least not until I noticed that Mercurial seemed to be the only free VCS that wasn't enclosed in a long and colorful string of profanity when he talked about it.

Anyway, the more I use Mercurial, the more I like it. It meets all of the requirements I mentioned above, plus it has the speed and power of Git and the simplicity of Subversion and CVS. Mercurial is actively developed, has full Windows support, and it includes extensions that add support for PGP-signed tags and Quilt-style patch queues.

The real killer feature for me, though, is that everything I try just works. Setting up read-only, web-accessible public repository only took a minute or two of reading, and making an entire directory of Mercurial repositories available only took a couple more minutes. I had comparable experiences with branching, tagging, signing tags, and pushing changes to multiple repositories.

The only warts I've found in Mercurial so far are minor; the web interface needs a bit of cleanup, and there should be a straightforward way of adding repository defaults like style, contact and archive formats via the top-level htwebdir configuration file. The native import features are still a bit lacking, although you can use Tailor to convert data from all but the most esoteric or convoluted repositories.

That's about all the advocacy I can muster up at the moment. If you're interested in reading more about the state of distributed version control systems, there are more detailed VCS comparisons here and here.

Note: I've had this post sitting in my queue for months. I just brushed off the cobwebs, cleaned up the typos, and posted it. In that time Mercurial has picked up a bit of publicity, and development has been moving along at a steady clip. I tried to remove the bits that no longer apply, but let me know if I missed anything.

Edit: This article was linked on Reddit; some additional conversation (and my responses) can be found in the comment thread.