Migrating Codebases From SVN to Git
In this blog post, I will share the method and the tools we used in the migration of our codebases, and the experiences we have gained during our migration from SVN to Git at Kartaca.
- Creating the “authors” file
- Defining the migration structure of the code bases that are more complex than the standard directory
- Migrating the defined code bases to a git repository with a tool
- Checking if the migration of repository is done correctly and dealing with required additional arrangements
- Pushing the generated repository to its counterpart in Git and creating access for the project team
- Cloning the newly created repository to the team members and allowing them to do tests and other activities for learning purposes, according to the request of the project team
- If there is a version script, making it point to the new repository
- Making CI/CD tools point to the new repository
- (If there was an update after completing steps 6, 7 and 8) Deleting the repository and repeating steps between 3 and 5
- Closing down the SVN codebase
1. Creating the “authors” file
In SVN repositories, user names appear in the logs as author information. We need to prepare a map file containing the email and names corresponding to the user names for migration. This file should contain, for example, the following lines:
serdar = Serdar Yuksel <firstname.lastname@example.org>
Here is a how-to guide for preparing this file.
2. Defining the directories
Fundamentally, we define each directory with trunk/branches/tags subdirectories as a separate repository. Usually, this doesn’t make much difference in the way we use it; the address changes only.
Example: Codebase => foo
The existing trunk/branches/tags directories in the foo codebase will create a foo repository, and bar under subprojects will be a separate bar repository with its own trunk/branches/tags directories.
Although baz under deps doesn’t have trunk/branches directories, we will create it as a separate repository.
3. Important notes about migration
- Empty directories: Git is a tool that manages changes; it is not a file system. Therefore, empty directories have no use, so most tools don’t bring them during migration. When using git-svn, if we specify, we can bring the empty directories. However, it is said to slow down the process according to this article. If you have to bring the empty directories, it is quite easy to create them later, as described in the same article.
- Branches and tags: Most tools cannot create branches and tags correctly, and later you might need to edit those. As far as I can see, subgit can do it.
- Ignore files: Most tools do not even check ‘ignore files’ during migration. But reposurgeon and subgit do migrate them. However, please take into account that if you use the SVN repository with git-svn and create .gitignore, subgit will override it during migration.
- SVN revision numbers: You may want to display SVN revision numbers on the commit logs for information purposes (for example I would). To do this in reposurgeon, we need to pass the legacy parameter to the write command. And in subgit, we can add revision numbers with an additional step as explained here.
- Timezone information in the logs: After the migration, unfortunately, the timezone information in the commit logs are lost (although we have it in SVN). If you want, you can add offset to all commit logs with reposurgeon, but I did not do it. With git log — date=local, we can see the dates according to their local timezone information.
- I wrote down my progress on the migration in “General Notes on the Process” section below. (It is the latest section of this blog post) You can get an idea about the tools I used and more from there.
⇒ Now I will elaborate on the commands I used for the two tools I tried. (I didn’t add git-svn as it failed.)
We have directories trunk/branches/tags in the codebase foo, bar under subprojects and baz under deps.
Firstly, we want to import the trunk/branches/tags directories into a separate git repository as foo. (I put the repository in the file system because it is large, but this is not really necessary.)
$ /home/serdar/local/subgit-3.2.6/bin/subgit import --svn-url file:///home/serdar/svn-to-git/svn/foo/ --authors-file ../kartaca_authors.txt --trunk trunk --branches branches --tags tags foo.git
This command created the migrated git repository in the foo.git directory.
Then, as explained here, I cloned the repository as a mirror and edited to add the revision numbers:
$ git clone --mirror foo.git foo-clone $ cd foo-clone $ git filter-branch --msg-filter ' REV=$(git log --format="%N" $GIT_COMMIT -1) cat echo "n" echo -n "$REV" ' -- --branches --tags
I migrated the other repositories in foo (subprojects/bar and deps/baz) as follows:
$ /home/serdar/local/subgit-3.2.6/bin/subgit import --svn-url https://svndepoadresi/foo/subprojects/bar/ --authors-file ../kartaca_authors.txt --trunk trunk --branches branches --tags tags bar.git
$ /home/serdar/local/subgit-3.2.6/bin/subgit import --svn-url https://svndepoadresi/foo/deps/ --authors-file ../kartaca_authors.txt --trunk baz baz.git
As there were no directories like trunk, etc. in baz, I could have migrated it as a single branch. However, subgit expects you to specify at least one trunk in a repository. I’ve overcome this by indicating the repository address as deps and the directory baz as a trunk.
I should have migrated moo – another repository with no directories like trunk, etc. – as a master, but I couldn’t specify a trunk as it didn’t have a parent directory. Therefore I could only migrate it using reposurgeon.
To use this tool (assuming reposurgeon directory is added to $PATH), firstly we create a directory and run the below command in it:
$ mkdir qux-reposurgeon $ cd qux-reposurgeon $ repotool initialize qux
Then reposurgeon creates the necessary files for migration. We edit the Makefile file, correct the REMOTE_URL address, copy the authors map file and run make.
$ cp /home/serdar/svn-to-git/kartaca_authors.txt qux.map $ make
The tool creates a migrated directory called qux-git. Afterward, we may need to clean the tags and branches.
If we want to add SVN revision numbers, we need to pass the legacy parameter to the write command in the Makefile (where the reposurgeon commands are given at the end), for example:
# Build the second-stage fast-import stream from the first-stage stream dump qux.fi: qux.svn qux.opts qux.lift qux.map $(EXTRAS) $(REPOSURGEON) $(VERBOSITY) "script qux.opts" "read $(READ_OPTIONS) <qux.svn" "authors read <qux.map" "sourcetype svn" "prefer git" "script qux.lift" "legacy write >qux.fo" "write --legacy >qux.fi"
With this parameter, revision numbers come under the commit logs as “Legacy-ID: XXX“.
I needed to use the branchify nobranch command before the read command for moo, a repository that should only be taken as master. (Otherwise, it searches a trunk directory, and if it cannot find it, it does not get anything):
# Build the second-stage fast-import stream from the first-stage stream dump moo.fi: moo.svn moo.opts moo.lift moo.map $(EXTRAS) $(REPOSURGEON) $(VERBOSITY) "script moo.opts" "branchify nobranch" "read $(READ_OPTIONS) <moo.svn" "authors read <moo.map" "sourcetype svn" "prefer git" "script moo.lift" "legacy write >moo.fo" "write >moo.fi"
reposurgeon documentation is too long, but it may be useful if needed. (I hope you won’t, but I still add it here.)
4. Checking if the migration of repository is done correctly and dealing with required additional arrangements
I couldn’t find a way that is both automatic and good, for verification. I visually inspected the commit logs, tags and branches, file directory structure, and the last version of the files (the directory that was checked out with the diff -r from the svn, and the directory that was cloned from git).
In any case, it is necessary to see the differences between the existing svn checkout directory with the diff -r and the migrated git working tree. Bearing in mind that it does not migrate empty directories, this is a point where we should check and decide if we need them. If necessary, here is an explanation of the method to add empty directories.
If we use subgit, the only thing left to do is adding the SVN revision (if desired). Other than that, there is nothing else left to do before pushing the repository.
If we use reposurgeon, we need to clear the tags created for the empty commits and delete the undeleted branches again.
Below command is very useful in reducing the size of the final version of the repository:
$ cd foo-clone $ git gc --aggressive
5. Pushing the repository to its counterpart in Git and creating access for the project team
We used Bitbucket as the Git repository.
To do this step, you must first create the repository in the Bitbucket project. For example, the repository address for foo is:
Then, I did the below operations on the clone that I created with subgit, mirrored, added SVN revisions, and reduced the size with git gc, and finally put it into the Bitbucket repository:
$ cd foo-clone $ git remote rm origin $ git remote add origin ssh://git@gitrepository/foo.git/scm/cdt/foo.git $ git push --all origin $ git push --tags origin
At Kartaca, we manage project access with LDAP. We gave the accesses while creating the projects. The team continued to access the existing codebases and projects the same way; we only gave them access.
At this point, we are almost ready to close down SVN and start using the repository on Bitbucket. The only bottleneck is the things still depending on the SVN repository; such as the Bamboo configuration, version script (items 7 and 8), or the owner of the codebase requiring that the team gains some more experience and deciding to do item 6.
If the team members still have uncommitted changes, they can copy all the files in their directory to the newly cloned directory (overwrite) and delete the .svn directories so that they start using the Bitbucket repository.
6. Cloning the newly created repository to the team members and allowing them to do tests and other activities for learning purposes, according to the request of the project team
Once we have the migrated repository on Bitbucket and we have created access for the project team, we can continue using SVN and test the Bitbucket repository at the same time. The team can clone this new repository to their local environment and work on it to learn.
As the migration methods I mentioned are a one-time process, we will need to perform the migration again after finishing the tests due to the SVN updates during the process. This means that we should repeat the steps 3, 4, and 5. (In Bitbucket, the repository in the project can be deleted and created again). Migration process usually is not a long process, so there is no harm in doing this.
7. and 8. If there are version script and CI/CD tools, making them point to the new repository
Our purpose at this stage is to make sure that we point anything that is still dependent on SVN to the new git repository.
The first thing we think of is the version script and the configuration of Bamboo, the tool we use for CI/CD. You may have other issues depending on your projects. We assume that the owner of the codebase is aware of all issues, and proceed.
Since Bamboo configuration is under the responsibility of our Product and System Administrator, we will need to coordinate with him/her too. We cannot complete the migration can before completing these steps.
10. Closing down the SVN codebase
Once the owner of the codebase and the Product and System Administrators decide together that we can start using the new repository (and we complete item 9, if necessary). Afterward, we stop using the SVN repository and switch to Bitbucket. The owner of the codebase informs the Product System Administrators that the SVN repository can now be archived.
General Notes on the Process
Since we were going to use Bitbucket, I followed the Bitbucket Migration Guide as a guide. The method shown in this document uses git-svn.
When I tried to migrate the codebase foo with git-svn, it took me about 4 hours and resulted in an error. But when I checked, it seemed as if it created the repository. So I moved on to the cleaning step (creation of tags and branches, etc.) using Atlassian’s tool, but it didn’t work.
“It makes sense to use git-svn as a bridge, but it doesn’t make much sense for migration.“
Therefore, though I was reluctant at first, I started searching for other tools.
At first glance, Eric Raymond’s tool reposurgeon looked pretty complicated to me. That’s why I started with subgit, which I had used for migrating from svn to git before and thought was easy to use. I managed to successfully migrate the repository foo, with multiple branches and 3200 revisions, at the end of 7 hours. (except for the time it took!) For some reason, migration did not take such a long time with other repositories, though they had many revisions as well.
Due to my respect for him, I decided to take his opinions into account and move forward with reposurgeon. It worked incredibly fast, but unfortunately, it failed to complete the migration of the codebase foo. I couldn’t figure out why.
I couldn’t see much of a problem in the codebase foo after migrating with subgit, so I decided to move on with it, but I tried reposurgeon with other repositories. Within minutes, I was able to migrate a few codebases, some of which included about 8000 revisions. Due to the intensive use of branches and tags in one of the codebases, I needed to do some more work after the migration. (Deleting the branches mean deleting the directory. As it creates tags for such commits, we can see which commits delete directories and branches with these tags.)
Apart from the codebase foo, I’ve successfully migrated several other codebases within minutes, using subgit. (I couldn’t figure out why it took foo so long to migrate itself). For all these repositories, subgit did not leave me with any extra work, except for adding the SVN revision numbers and the git-gc.
I tried to migrate moo as a single repository without the branch. I couldn’t do it with subgit; I could do it with reposurgeon using branchify command.
In my opinion, subgit was the most hassle-free tool. (Btw, Bitbucket and Gitlab also use subgit for import and gateway jobs.). I recommend that you move on with subgit and if it doesn’t work move on either with reposurgeon or others.