Have you ever needed to split a repository, or take out just a few directories, retaining their history? Or maybe your repo contains too many (possibly unrelated) projects? Or has grown so big that you can’t even clone it?
Mercurial’s convert extension is here to help. It’s a multitool that can convert from various other VCS, like git, SVN, ect, and from HG. The last one is what we need.
Why
The benefits of splitting large repositories are:
- teams can work independently and move at different speeds
- you can give someone (an outsourcer perhaps) access to only some parts of your codebase
- smaller repos are easier to manage
- some CI systems (like appveyor or travis) use a single configuration file per repo - stuffing multiple projects into these files will just complicate the build and obfuscate results
Powering up convert
with some scripts
convert
is a rather low-level tool and needs a few configuration files and commandline options to work the way you want. And let’s face it - you won’t be right for the first time and will need to do some tweaking.
That’s why I created a few powershell scripts and template files to help us with conversion.
Configuration
-
Enable the
convert
extension in mercurial.ini:[extensions] convert =
-
Clone or download this gist. I recommend creating a separate directory (and possibly version controling it) for every conversion you make and copying these files there.
-
Create two files:
branchmap.txt
andfilemap.txt
(you may copy them frombranchmap.sample.txt
andfilemap.sample.txt
). These are the config files we will use to tell mercurial which directories to inlcude in converted repo and how to treat branches. As you will see, these files support an extended syntax (in comparison to whatconvert
understands). They are then used to generate real branchmap/filemap files for mercurial.
Now, we need to fill these config files.
Sample repository
Let’s use the repo at https://bitbucket.org/heavymetaldev/convert-me as an example. The structure looks like this:
|-- convert-me
|-- .hgignore
|-- top-secret.txt
|-- sln
| |-- MyProject.Core
| | |-- MyProject.Core.sln
| |-- MyProject.Desktop
| |-- MyProject.Desktop.sln
|-- src
|-- MyProject.Core.Api
| |-- MyProject.Core.Api.csproj
|-- MyProject.Core.Model
| |-- MyProject.Core.Model.csproj
|-- MyProject.Core.Utils
| |-- MyProject.Core.Utils.csproj
|-- MyProject.Desktop.WinForms
|-- MyProject.Desktop.WinForms.csproj
This repository contains c# projets, but scripts and methods described here can be as well applied to any other mercurial repo.
There are two solution files MyProject.Core
and MyProject.Desktop
. I want to move these solutions to two separate repositories (repo-a
and repo-b
), along with the projects they refer to). Additionally, I want to remove the toplevel file top-secret.txt
, as it contains confidential data.
Repo A
should look like this:
|-- Repo-A
|-- .hgignore
|-- sln
| |-- MyProject.Core
| | |-- MyProject.Core.sln
|-- src
|-- MyProject.Core.Api
| |-- MyProject.Core.Api.csproj
|-- MyProject.Core.Model
| |-- MyProject.Core.Model.csproj
|-- MyProject.Core.Utils
| |-- MyProject.Core.Utils.csproj
Repo B
should contain remaining projects and files:
|-- Repo-B
|-- .hgignore
|-- sln
| |-- MyProject.Desktop
| |-- MyProject.Desktop.sln
|-- src
|-- MyProject.Desktop.WinForms
|-- MyProject.Desktop.WinForms.csproj
Filemap on steroids
Let’s start with filemap. It defines, which files or directories should be included (or excluded) in the new repository. You may also use it to rename files.
The extended filemap format supports lines in the following forms:
# this is th basic mercurial stuff:
include path/to/file
exclude path/to/file
rename from/file to/file
# this is extended format:
include r:regex/to/.*/include
include r:!regex/to/.*/include/if/not/match
exclude r:regex/to/.*/exclude
exclude r:!regex/to/.*/exclude/if/not/match
include sln:path/to/something.sln
r:
indicates that this entry is a regex.r:!
is a negated regex (i.e.: everything that does not match this pattern).sln:
is specifically for C# solution files. This will parse the.sln
file and generate include entries for everycsproj
it contains. In other words, this will include the whole solution.
Let’s look at our sample repo. For converting to repo-a
, we can use the following filemap.txt
content:
include .hgignore
include r:.*/MyProject\.Core(\..*){0,1}/
exclude top-secret.txt
By default, everything that’s not included gets excluded, so the last line isn’t really necessary, but we’ll leave it there for verbosity.
This will generate the following filemap.gen.txt
for mercurial to use:
include ".hgignore"
include "sln/MyProject.Core"
include "src/MyProject.Core.Api"
include "src/MyProject.Core.Model"
include "src/MyProject.Core.Utils"
include "src/MyProject.Core.Api/App_Data"
include "src/MyProject.Core.Api/App_Start"
include "src/MyProject.Core.Api/Controllers"
include "src/MyProject.Core.Api/Models"
include "src/MyProject.Core.Api/Properties"
include "src/MyProject.Core.Api/Service References"
include "src/MyProject.Core.Api/Service References/Application Insights"
include "src/MyProject.Core.Model/Properties"
include "src/MyProject.Core.Utils/Properties"
remove top-secret.txt
Some of these entries are in fact obsolete. Once we include a directory, there is no need to include all it’s subdirectories. But since the file is autogenerated, this is not a worry.
For repo-b
, I will go minimalist and use sln:
prefix:
include .hgignore
include sln:sln/MyProject.Desktop/MyProject.Desktop.sln
exclude top-secret.txt
Conversion Process
We will use hg-convert.ps1
script to do the conversion. Sample usage:
PS> .\hg-convert path/to/source/convert-me path/to/target/repo-a -startrev 123
This script takes care of configuring and calling hg convert
. It will:
- Take
filemap.txt
(if it exists), generatefilemap.gen.txt
and pass it toconvert
- Take
branchmap.txt
(if it exists), generatebranchmap.gen.txt
and pass it toconvert
(more of branchmap later) - Check if the target repository already exists (use
-force
to force overwrite) - Convert the repository at
path/to/source/convert-me
, starting at revision123
and save it atpath/to/target/repo-a
startrev
specifies the revision at which the conversion process should start (and convert it and all of its descendants). If you specify 0
(default), it will convert whole repository (which may take a considerable time if the repo is big). For testing purpose, I recommend starting with the latest revision. This way, only this one revision will be converted and you can check, if you have included everything you need in the filemap. My process is as follows (this should will save you some time and frustration):
- Setup filemap
-
Convert only the newest revision, using
startrev
parameter, i.e:PS> .\hg-convert ../convert-me ../repo-a -startrev 55
- Check the converted repository - try to build everything
- Copy missing files from old repo to the new repo and add them to filemap, until the new repo builds properly
- Repeat from 2. until I get it rigth
-
Start full conversion from revision 0
PS> .\hg-convert ../convert-me ../repo-a -startrev 0
If everything goes rigth, we got now two separate repositories, repo-a
and repo-b
. Notify other developers of the change, so no one tries to push to the old repo (renaming or removing it might also be a good way to prevent this).
But wait, there are also some other scenarios whe should cover.
Automated branchmap
Branchmap defines the mapping between branch names in old repo and new repo. branchmap.txt
support the following line format:
# this is th basic mercurial stuff:
original_branch_name new_branch_name
# this is extended format:
r:release/.* release
r:!release default
* default
Similar to filemap, r:
and r:!
denotes regex to match/notmatch. A single *
means - you guessed it - “everything”.
For example, my branchmap.txt
could look like this:
* default
r:release/.* release
dev dev
All branches that match release/*
pattern will be renamed to release
. Branch dev
will remain dev
. Everything else will be renamed to default
.
Note that the order matters here. If a branch matches multiple patterns, the last one will always win. So, start with the most generic one. If you write
* default
at the end of file, everything before it will be effectively ignored. You may want to inspectbranchmap.gen.txt
to see, if everything looks like you wanted.
Appending revisions to existing repo
The last thing I want to mention is appending parts of history onto one another. Let’s go back to our convert-me
repo. The news of switching to new repositories hasn’t reach one developer, who just pushed some critical changes in MyProject.Core.Model
and MyProject.Desktop.WinForms
in convert-me
repo (instead of repo-a
and repo-b
respectively) - lets call them “offending changes”. How to transfer these changes to new repos without breaking anything? Run convert
again? But this will recreate these repositories, effectively breaking them for everyone who has them checked out.
convert
gives us a way to append parts of converted history into an existing repo. And this is exactly what we need in this case. We will:
- specify offending changes (starting from the first revision that hasn’t been converted before)
- Check offending changes parent and find corresponding commits in
repo-a
andrepo-b
- Convert offending changes and append them onto these corresponding parent commits. We will use the same filemaps and branchmaps to filter only required files.
Now, that’s it. Hope you find this helpful and if you have any problems with the scripts - please drop me a line!
Resources
- https://www.mercurial-scm.org/wiki/ConvertExtension
- http://hgtip.com/tips/advanced/2009-11-16-using-convert-to-decompose-your-repository/
- https://gist.github.com/qbikez/e900456032833fb2baaaee87e19a8ccd