Mercurial repository conversion

Have you ever needed to split a repository, or take out just a few directories, retaining their history? Or maybe your repo contains too many (possibly unrelated) projects? Or has grown so big that you can't even clone it?

Mercurial's convert extension is here to help. It's a multitool that can convert from various other VCS, like git, SVN, ect, and from HG. The last one is what we need.

Why

The benefits of splitting large repositories are:

  • teams can work independently and move at different speeds
  • you can give someone (an outsourcer perhaps) access to only some parts of your codebase
  • smaller repos are easier to manage
  • some CI systems (like appveyor or travis) use a single configuration file per repo - stuffing multiple projects into these files will just complicate the build and obfuscate results

Powering up convert with some scripts

convert is a rather low-level tool and needs a few configuration files and commandline options to work the way you want. And let's face it - you won't be right for the first time and will need to do some tweaking.

That's why I created a few powershell scripts and template files to help us with conversion.

Configuration

  1. Enable the convert extension in mercurial.ini:

    [extensions]
    convert =
    
  2. Clone or download this gist. I recommend creating a separate directory (and possibly version controling it) for every conversion you make and copying these files there.

  3. Create two files: branchmap.txt and filemap.txt (you may copy them from branchmap.sample.txt and filemap.sample.txt). These are the config files we will use to tell mercurial which directories to inlcude in converted repo and how to treat branches. As you will see, these files support an extended syntax (in comparison to what convert understands). They are then used to generate real branchmap/filemap files for mercurial.

Now, we need to fill these config files.

Sample repository

Let's use the repo at https://bitbucket.org/heavymetaldev/convert-me as an example. The structure looks like this:

|-- convert-me
    |-- .hgignore
    |-- top-secret.txt
    |-- sln
    |   |-- MyProject.Core
    |   |   |-- MyProject.Core.sln
    |   |-- MyProject.Desktop
    |       |-- MyProject.Desktop.sln
    |-- src
        |-- MyProject.Core.Api
        |   |-- MyProject.Core.Api.csproj
        |-- MyProject.Core.Model
        |   |-- MyProject.Core.Model.csproj
        |-- MyProject.Core.Utils
        |   |-- MyProject.Core.Utils.csproj
        |-- MyProject.Desktop.WinForms
            |-- MyProject.Desktop.WinForms.csproj

This repository contains c# projets, but scripts and methods described here can be as well applied to any other mercurial repo.

There are two solution files MyProject.Core and MyProject.Desktop. I want to move these solutions to two separate repositories (repo-a and repo-b), along with the projects they refer to). Additionally, I want to remove the toplevel file top-secret.txt, as it contains confidential data.

Repo A should look like this:

|-- Repo-A
    |-- .hgignore
    |-- sln
    |   |-- MyProject.Core
    |   |   |-- MyProject.Core.sln
    |-- src
        |-- MyProject.Core.Api
        |   |-- MyProject.Core.Api.csproj
        |-- MyProject.Core.Model
        |   |-- MyProject.Core.Model.csproj
        |-- MyProject.Core.Utils
        |   |-- MyProject.Core.Utils.csproj

Repo B should contain remaining projects and files:

|-- Repo-B
    |-- .hgignore
    |-- sln
    |   |-- MyProject.Desktop
    |       |-- MyProject.Desktop.sln
    |-- src
        |-- MyProject.Desktop.WinForms
            |-- MyProject.Desktop.WinForms.csproj

Filemap on steroids

Let's start with filemap. It defines, which files or directories should be included (or excluded) in the new repository. You may also use it to rename files.

The extended filemap format supports lines in the following forms:

# this is th basic mercurial stuff:
include path/to/file
exclude path/to/file
rename from/file to/file

# this is extended format:
include r:regex/to/.*/include
include r:!regex/to/.*/include/if/not/match
exclude r:regex/to/.*/exclude
exclude r:!regex/to/.*/exclude/if/not/match
include sln:path/to/something.sln
  • r: indicates that this entry is a regex. r:! is a negated regex (i.e.: everything that does not match this pattern).
  • sln: is specifically for C# solution files. This will parse the .sln file and generate include entries for every csproj it contains. In other words, this will include the whole solution.

Let's look at our sample repo. For converting to repo-a, we can use the following filemap.txt content:

include .hgignore
include r:.*/MyProject\.Core(\..*){0,1}/
exclude top-secret.txt

By default, everything that's not included gets excluded, so the last line isn't really necessary, but we'll leave it there for verbosity.

This will generate the following filemap.gen.txt for mercurial to use:

include ".hgignore"
include "sln/MyProject.Core"
include "src/MyProject.Core.Api"
include "src/MyProject.Core.Model"
include "src/MyProject.Core.Utils"
include "src/MyProject.Core.Api/App_Data"
include "src/MyProject.Core.Api/App_Start"
include "src/MyProject.Core.Api/Controllers"
include "src/MyProject.Core.Api/Models"
include "src/MyProject.Core.Api/Properties"
include "src/MyProject.Core.Api/Service References"
include "src/MyProject.Core.Api/Service References/Application Insights"
include "src/MyProject.Core.Model/Properties"
include "src/MyProject.Core.Utils/Properties"
remove top-secret.txt

Some of these entries are in fact obsolete. Once we include a directory, there is no need to include all it's subdirectories. But since the file is autogenerated, this is not a worry.

For repo-b, I will go minimalist and use sln: prefix:

include .hgignore
include sln:sln/MyProject.Desktop/MyProject.Desktop.sln
exclude top-secret.txt

Conversion Process

We will use hg-convert.ps1 script to do the conversion. Sample usage:

PS> .\hg-convert path/to/source/convert-me path/to/target/repo-a -startrev 123

This script takes care of configuring and calling hg convert. It will:

  1. Take filemap.txt (if it exists), generate filemap.gen.txt and pass it to convert
  2. Take branchmap.txt (if it exists), generate branchmap.gen.txt and pass it to convert (more of branchmap later)
  3. Check if the target repository already exists (use -force to force overwrite)
  4. Convert the repository at path/to/source/convert-me, starting at revision 123 and save it at path/to/target/repo-a

startrev specifies the revision at which the conversion process should start (and convert it and all of its descendants). If you specify 0 (default), it will convert whole repository (which may take a considerable time if the repo is big). For testing purpose, I recommend starting with the latest revision. This way, only this one revision will be converted and you can check, if you have included everything you need in the filemap. My process is as follows (this should will save you some time and frustration):

  1. Setup filemap
  2. Convert only the newest revision, using startrev parameter, i.e:

    PS> .\hg-convert ../convert-me ../repo-a -startrev 55
    
  3. Check the converted repository - try to build everything

  4. Copy missing files from old repo to the new repo and add them to filemap, until the new repo builds properly

  5. Repeat from 2. until I get it rigth

  6. Start full conversion from revision 0

    PS> .\hg-convert ../convert-me ../repo-a -startrev 0
    

If everything goes rigth, we got now two separate repositories, repo-a and repo-b. Notify other developers of the change, so no one tries to push to the old repo (renaming or removing it might also be a good way to prevent this).

But wait, there are also some other scenarios whe should cover.

Automated branchmap

Branchmap defines the mapping between branch names in old repo and new repo. branchmap.txt support the following line format:

# this is th basic mercurial stuff:
original_branch_name new_branch_name

# this is extended format:
r:release/.* release
r:!release default
* default

Similar to filemap, r: and r:! denotes regex to match/notmatch. A single * means - you guessed it - "everything".

For example, my branchmap.txt could look like this:

* default
r:release/.* release
dev dev

All branches that match release/* pattern will be renamed to release. Branch dev will remain dev. Everything else will be renamed to default.

Note that the order matters here. If a branch matches multiple patterns, the last one will always win. So, start with the most generic one. If you write * default at the end of file, everything before it will be effectively ignored. You may want to inspect branchmap.gen.txt to see, if everything looks like you wanted.

Appending revisions to existing repo

The last thing I want to mention is appending parts of history onto one another. Let's go back to our convert-me repo. The news of switching to new repositories hasn't reach one developer, who just pushed some critical changes in MyProject.Core.Model and MyProject.Desktop.WinForms in convert-me repo (instead of repo-a and repo-b respectively) - lets call them "offending changes". How to transfer these changes to new repos without breaking anything? Run convert again? But this will recreate these repositories, effectively breaking them for everyone who has them checked out.

convert gives us a way to append parts of converted history into an existing repo. And this is exactly what we need in this case. We will:

  1. specify offending changes (starting from the first revision that hasn't been converted before)
  2. Check offending changes parent and find corresponding commits in repo-a and repo-b
  3. Convert offending changes and append them onto these corresponding parent commits. We will use the same filemaps and branchmaps to filter only required files.

Now, that's it. Hope you find this helpful and if you have any problems with the scripts - please drop me a line!

Resources