wiki:MigrationToSubversionAndTrac

Migration to Subversion and Trac

Background

At the time when I started writing OpenPAM, I was working on PAM integration and other security-related issues in FreeBSD, as a subcontractor for Network Associates (since then acquired by McAfee) under the DARPA CHATS program. This involved maintaining a rather large set of patches to the FreeBSD source tree; the most convenient way to do this was to use the FreeBSD project's Perforce depot. Thus, OpenPAM, which grew out of this work, was initially maintained in that depot.

I quickly grew annoyed with Perforce, and especially with how little it improved over time. There was also the issue of control: I did not control the Perforce server, nor the hardware it ran on. By virtue of being hosted by it, OpenPAM remained closely tied to FreeBSD, which I believe has been a major obstacle to its adoption by other operating systems.

For a long time, however, there were no clear alternatives to Perforce, especially because I did not want to lose history when I switched version control systems. I had long been interested in Subversion, however, and made several attempts at migrating to it until I finally succeeded in February 2006.

Converting the repository

I searched for a tool which could extract a set of files from a Perforce depot and add them, with full history, to a Subversion repository, and came across Ray Miller's p42svn.

After much expermientation, I came to the conclusion that I could not use p42svn unmodified, because of a problem with the way it tries to work around a bug in the Perforce client libraries.

The basic problem is that P4::Print() behaves inconsistently for different file types: if the requested file is a text file, it returns its contents as a string; but if it is a binary file, it prints the contents to STDOUT instead. The way p42svn works around this is that for every file it needs to download (which is every file that was changed in every changeset), it forks off a child which opens a server connection, calls P4::Print() and prints the results. The parent simply captures the child's STDOUT and gets the data it needs, regardless of the file type.

This workaround was problematic for me for two reasons: first of all, my access to FreeBSD's Perforce depot was over SSH across 13 hops with a 200 ms RTT, which means that connection setup and teardown alone takes almost a second. Furthermore, the server is fairly heavily congested and apparently implements some kind of SSH connection rate limiting, which meant that sooner or later P4::Init() would fail and p42svn would immediately quit instead of retrying after a short pause.

I ended up making the following changes:

  • p4_init() was modified to always return the same client connection, which is cached in a global variable
  • All references to P4::Final() were commented out to avoid closing the cached connection.
  • p4_get_file_content() was modified to call P4::Print() directly, without forking. This was safe to do for OpenPAM, because it does not contain any binary files.

I admit that it's a hack, but it works. The result can be found in the repository.

Last modified 6 years ago Last modified on Mar 31, 2012, 9:54:54 PM