Migrating Codebases From SVN to Git

In this blog post, I share the method and the tools we used in the migration of our codebases and the experiences we have gained during our migration from SVN to Git at Kartaca.

Basic Method

Create the “authors” file
Define the migration structure of the codebases that are more complex than the standard directory
Migrate the defined codebases to a Git repository with a tool
Check if the migration of repository is done correctly and dealing with required additional arrangements
Push the generated repository to its counterpart in Git and create access for the project team
Clone the newly created repository to the team members and allow them to do tests and other activities for learning purposes based on the request of the project team
If there is a version script, point it to the new repository
Make CI/CD tools point to the new repository
Delete the repository and repeat steps between 3 and 5 (if there was an update after completing steps 6, 7, and 8)
Close down the SVN codebase

1. Create the “authors” file

In SVN repositories, user names appear in the logs as author information. We need to prepare a map file containing the email and names corresponding to the user names for migration. This file should include, for example, the following lines:

serdar = Serdar Yuksel <serdar.yuksel@geemail.com>

Here is a how-to guide for preparing this file.

2. Define the directories

Fundamentally, we define each directory with trunk/branches/tags subdirectories as a separate repository. Usually, this doesn’t make much difference in how we use it; the address changes only.

Example:Codebase => foo

The existing trunk/branches/tags directories in the foo codebase create a foo repository. The bar under subprojects is a separate bar repository with its own trunk/branches/tags directories.

Although baz under deps doesn’t have trunk/branches directories, we create it as a separate repository.

3. Important notes about migration

Empty directories: Git is a tool that manages changes; it is not a file system. Therefore, empty directories have no use, so most tools don’t bring them during migration. When using git-svn, if we specify, we can bring the empty directories. However, it is said to slow down the process, according to this article. If you have to bring the empty directories, it is quite easy to create them later, as described in the same article.

Branches and tags: Most tools cannot create branches and tags correctly, and later you might need to edit those. As far as I can see, subgit can do it.

Ignore files: Most tools do not even check ‘ignore files’ during migration. But reposurgeon and subgit do migrate them. However, please consider that if you use the SVN repository with git-svn and create .gitignore, subgit overrides it during migration.

SVN revision numbers: You may want to display SVN revision numbers on the commit logs for information purposes (for example, I would). To do this in reposurgeon, we need to pass the legacy parameter to the write command. And in subgit, we can add revision numbers with an additional step, as explained here.

Timezone information in the logs: After the migration, unfortunately, the timezone information in the commit logs are lost (although we have it in SVN). If you want, you can add offset to all commit logs with reposurgeon, but I did not do it. With git log — date=local, we can see the dates according to their local timezone information.
I wrote down my progress on the migration in the “General Notes on the Process” section below. (It is the latest section of this blog post) You can get an idea about the tools I used and more from there.

⇒ Now I elaborate on the commands I used for the two tools I tried. (I didn’t add git-svn as it failed.)

Source: https://subgit.com/

We have directories trunk/branches/tags in the codebase foo, bar under subprojects, and baz under deps.

Firstly, we want to import the trunk/branches/tags directories into a separate Git repository as foo. (I put the repository in the file system because it is large, but this is unnecessary.)

/home/serdar/local/subgit-3.2.6/bin/subgit import \
    --svn-url file:///home/serdar/svn-to-git/svn/foo/ \
    --authors-file ../kartaca_authors.txt \
    --trunk trunk \
    --branches branches \
    --tags tags \
    foo.git

This command created the migrated Git repository in the foo.git directory.

Then, as explained here, I cloned the repository as a mirror and edited to add the revision numbers:

$ git clone --mirror foo.git foo-clone 
$ cd foo-clone 
$ git filter-branch --msg-filter 
' REV=$(git log --format="%N" $GIT_COMMIT -1) 
cat echo "n" echo -n "$REV" ' -- --branches --tags

I migrated the other repositories in foo (subprojects/bar and deps/baz) as follows:

$ /home/serdar/local/subgit-3.2.6/bin/subgit import --svn-url https://svndepoadresi/foo/subprojects/bar/ --authors-file ../kartaca_authors.txt --trunk trunk --branches branches --tags tags bar.git
$ /home/serdar/local/subgit-3.2.6/bin/subgit import --svn-url https://svndepoadresi/foo/deps/ --authors-file ../kartaca_authors.txt --trunk baz baz.git

As there were no directories like trunk, etc. in baz, I could have migrated it as a single branch. However, subgit expects you to specify at least one trunk in a repository. I’ve overcome this by indicating the repository address as deps and the directory baz as a trunk.

I should have migrated moo – another repository with no directories like trunk, etc. – as a master, but I couldn’t specify a trunk as it didn’t have a parent directory. Therefore I could only migrate it using reposurgeon.

Visit Reposurgeon.

To use this tool (assuming reposurgeon directory is added to $PATH), firstly, we create a directory and run the below command in it:

$ mkdir qux-reposurgeon
$ cd qux-reposurgeon
$ repotool initialize qux

Then reposurgeon creates the necessary files for migration. We edit the Makefile file, correct the REMOTE_URL address, copy the authors map file and run make.

$ cp /home/serdar/svn-to-git/kartaca_authors.txt qux.map
$ make

The tool creates a migrated directory called qux-git. Afterward, we may need to clean the tags and branches.

If we want to add SVN revision numbers, we need to pass the legacy parameter to the write command in the Makefile (where the reposurgeon commands are given at the end), for example:

# Build the second-stage fast-import stream from the first-stage stream dump 
qux.fi: qux.svn qux.opts qux.lift qux.map $(EXTRAS)        
     $(REPOSURGEON) $(VERBOSITY) "script qux.opts" "read $(READ_OPTIONS) qux.fo" "write --legacy >qux.fi"

With this parameter, revision numbers come under the commit logs as “Legacy-ID: XXX“.

I needed to use the branchify nobranch command before the read command for moo, a repository that should only be taken as master. (Otherwise, it searches a trunk directory, and if it cannot find it, it does not get anything):

# Build the second-stage fast-import stream from the first-stage stream dump 
moo.fi: moo.svn moo.opts moo.lift moo.map $(EXTRAS)       
      $(REPOSURGEON) $(VERBOSITY) "script moo.opts" "branchify nobranch" "read $(READ_OPTIONS) moo.fo" "write >moo.fi"

reposurgeon documentation is too long, but it may be useful if needed. (I hope you won’t, but I still add it here.)

4. Check if the migration of repository is done correctly and deal with required additional arrangements

I couldn’t find a way that is both automatic and good for verification. I visually inspected the commit logs, tags and branches, file directory structure, and the last version of the files (the directory checked out with the diff -r from the svn, and the directory cloned from Git).

In any case, it is necessary to see the differences between the existing svn checkout directory with the diff -r and the migrated Git working tree. Bearing in mind that it does not migrate empty directories, this is a point where we should check and decide if we need them. If necessary, here is an explanation of the method to add empty directories.

If we use subgit, adding the SVN revision (if desired) is the only thing left to do. Other than that, there is nothing else left to do before pushing the repository.

If we use reposurgeon, we need to clear the tags created for the empty commits and delete the undeleted branches again.

The command below is handy in reducing the size of the final version of the repository:

$ cd foo-clone 
$ git gc --aggressive

5. Push the repository to its counterpart in Git and create access for the project team

We used Bitbucket as the Git repository.

To do this step, you must first create the repository in the Bitbucket project. For example, the repository address for foo is:

ssh://git@gitrepository/foo.git

Then, I did the below operations on the clone that I created with subgit, mirrored, added SVN revisions, and reduced the size with git gc, and finally put it into the Bitbucket repository:

$ cd foo-clone
$ git remote rm origin
$ git remote add origin ssh://git@gitdeposu/foo.git/scm/cdt/foo.git
$ git push --all origin
$ git push --tags origin

At Kartaca, we manage project access with LDAP. We gave the accesses while creating the projects. The team continued to access the existing codebases and projects the same way; we only gave them access.

At this point, we are almost ready to close down SVN and start using the repository on Bitbucket. The only bottleneck is the things still depending on the SVN repository, such as the Bamboo configuration, version script (items 7 and 8), or the codebase owner requiring that the team gains some more experience and deciding to do item 6.

Suppose team members still have uncommitted changes. In that case, they can copy all the files in their directory to the newly cloned directory (overwrite) and delete the .svn directories to start using the Bitbucket repository.

6. Clone the newly created repository to the team members and allow them to do tests and other activities for learning purposes based on the request of the project team

Once we have the migrated repository on Bitbucket and we have created access for the project team, we can continue using SVN and test the Bitbucket repository at the same time. The team can clone this new repository to their local environment and work on it to learn.

As the migration methods I mentioned are a one-time process, we need to perform the migration again after finishing the tests due to the SVN updates during the process. It means that we should repeat steps 3, 4, and 5. (In Bitbucket, the project’s repository can be deleted and created again). The migration process usually is not long, so there is no harm in doing this.

7. and 8. If there are version script and CI/CD tools, make them point to the new repository

Our purpose at this stage is to make sure that we point anything that is still dependent on SVN to the new Git repository.

We think of the version script and Bamboo’s configuration, the tool we use for CI/CD. You may have other issues depending on your projects. We assume that the owner of the codebase is aware of all issues and proceed.

Since Bamboo configuration is under the responsibility of our Product and System Administrator, we need to coordinate with him/her too. We cannot complete the migration can before completing these steps.

10. Close down the SVN codebase

Once the owner of the codebase and the Product and System Administrators decide together to start using the new repository (and we complete item 9, if necessary), afterward, we stop using the SVN repository switch to Bitbucket. The owner of the codebase informs the Product System Administrators that the SVN repository can now be archived.

General Notes on the Process

Since we were going to use Bitbucket, I followed the Bitbucket Migration Guide as a guide. The method shown in this document uses git-svn.

When I tried to migrate the codebase foo with git-svn, it took me about 4 hours and resulted in an error. But when I checked, it seemed as if it created the repository. So I moved on to the cleaning step (creation of tags and branches, etc.) using Atlassian’s tool, but it didn’t work.

I was planning to continue with the instructions from the Git book. Then I researched further and came across an opinion that was led by Eric S. Raymond. The opinion said:

“Using git-svn as a bridge is reasonable, but it’s not ideal for migration.”

Therefore, though I was reluctant at first, I started searching for other tools.

At first glance, Eric Raymond’s tool reposurgeon looked pretty complicated to me. That’s why I started with subgit, which I had used for migrating from SVN to Git before, and I thought it was easy to use. I managed to successfully migrate the repository foo, with multiple branches and 3200 revisions, at the end of 7 hours. (except for the time it took!) For some reason, migration did not take such a long time with other repositories, though they had many revisions as well.

In the meantime, I read Eric Raymond’s migration guide and articles on other tools in the market. He says that subgit couldn’t pass more than half of his tests (but this may have changed over time.)

Due to my respect for him, I decided to take his opinions into account and move forward with reposurgeon. It worked incredibly fast, but unfortunately, it failed to complete the migration of the codebase foo. I couldn’t figure out why.

I couldn’t see much of a problem in the codebase foo after migrating with subgit, so I decided to move on with it, but I tried reposurgeon with other repositories. Within minutes, I was able to migrate a few codebases, some of which included about 8000 revisions. Due to the intensive use of branches and tags in one of the codebases, I needed to do some more work after the migration. (Deleting the branches mean deleting the directory. As it creates tags for such commits, we can see which commits delete directories and branches with these tags.)

Apart from the codebase foo, I’ve successfully migrated several other codebases within minutes, using subgit. (I couldn’t figure out why it took foo so long to migrate itself). For all these repositories, subgit did not leave me with any extra work, except for adding the SVN revision numbers and the git-gc.

I tried to migrate moo as a single repository without the branch. I couldn’t do it with subgit; I could do it with reposurgeon using the branchify command.

Result

In my opinion, subgit was the most hassle-free tool. (By the way, Bitbucket and Gitlab also use subgit for import and gateway jobs.). I recommend that you move on with subgit, and if it doesn’t work, move on either with reposurgeon or others.

Author: Serdar Yüksel

Date Published: Oct 23, 2018