S3 Subversion Solutions
Several S3 subversion solutions already exist, especially insofar as mounting S3 as a file system can theoretically provide the minimum requirements to host an S3 repo (* with sometimes major downsides).
- s3vn: http://www.cs.cornell.edu/projects/quicksilver/public_pdfs/ladis-s3vn.pdf, http://s3vn.msigi.net/ (not ready for production)
- s3fs: https://code.google.com/p/s3fs/wiki/InstallationNotes (stores full files, local cache important, but cache needs to be manually purged: https://code.google.com/p/s3fs/issues/detail?id=159 and https://code.google.com/p/s3fs/wiki/FuseOverAmazon [find “unbounded”])
- s3backer: https://code.google.com/p/s3backer/ (http://www.turnkeylinux.org/blog/exploring-s3-based-filesystems-s3fs-and-s3backer)
- s3ql: https://code.google.com/p/s3ql/wiki/Installation (also splits files to deduplicate)
The workaround to the limitations above
- We can copy db/revs/ files to S3, mount that bucket using s3fs, and then do a symlink from the normal revs folder to the s3fs mount after an appropriate delay so that S3 can reach consistency before we rely on it (via s3fs). This requires a footprint for the other (non-revs) svn folders, as well as a buffer for any revisions that have not yet been transferred. However, 5G is probably more than sufficient, compared to the full size of the repo.
- The script could be agnostic about the fs mount. It could simply take a configuration for the delay to account for the eventual consistency. Thus, the general functionality could be used to offload the revs to any type of FUSE system with relative safety as long as the FUSE system provides reliable read access after a predictable (and small) delay.
- Purging the local file (and replacing it with a symlink) could be done safely by checking “lsof filename” (list open files). If it is not present, then it is safe to remove. If it is present, then perhaps it should be touched to encourage update the mtime and give us a new window. We would not want to remove the file while it is being written, but it could be moved to a temporary file on the same file system to preserve active handlers while allowing the symlink to take its place for future file handlers.
- Find revs files that have not been replaced with a simple call:
find . -type -regex '[0-9]+'(or similar)
Proof of concept
The revs of an fsfs svn repo were successfully reconfigured to be symlinks to a different device while they were not being accessed. It was used in production with small user loads. The pseudo-code for a cron script would be:
foreach (rev_nonlink_files as rev_file) if (!offload_file.exists) cp rev_file offload_file else if (!rev_file.in_use) rm rev_file ln -s offload_file rev_file
This initiative was abandoned with AWS price changes. The cost savings of reducing EBS storage was reduced such that this more complex solution was no longer worth pursuing. However, I’m posting it in case someone does have a more significant repo and/or might want to work through the challengings of backing up the entire repo to S3 to further reduce expenses.