This document is a draft; presumably it will move to the handbook if and when it is finalized.

These are best practices for using Subversion from the perspective of the MAP team.They do not have to be your practices if you have a reason not to use them.

Practices 

  1. Each commit should contain one logical change.  Each commit should have a clear log message focusing on the why of a change, not on the how.  Documentation on how the new code works belongs in the code, not in the commit log.
  2. Each project should have its own repository.  The definition of a "project" is vague; in many cases, separate deployable units are separate projects.
  3. The top level of a repository should contain directories named "trunk", "branches", and "tags".  The trunk should be the mainline of development, and should be copied into subdirectories of branches and tags when needed.  Branches can be used for new feature development (forging ahead of the trunk) or release maintenance (lagging behind the trunk, perhaps cherry-picking changes from the trunk).  Tags can be used for snapshotting released code and similar purposes.  Once a tag is created by making a copy of the trunk, do not change it.
  4. Avoid checking in third-party material when possible.  Instead use infrastructure such as Maven to handle third-party dependencies.
  5. Avoid checking in generated objects when possible.  Subversion repositories should store sources, not builds.

Rationale and Caveats

1. One change per commit

The history of a project can be a valuable tool for understanding how a bug was introduced, how a bug was fixed, or why code is the way it is.  The more coherent and organized the project history, the more valuable the history will be in this respect.  Even if you are the only developer on a project, others may be looking at the history of your code in the future.  If you use Subversion merely as a snapshot mechanism without organizing and documenting the code's history, they won't get much out of it.

The downside: one change per commit will tend to slow down development.  If in the process of making a change, you notice something else wrong in a file you are working on, you won't be able to just fix it "while you are there" without compromising the unity of your commit; you'll have to instead make a note and come back to it later.  If you aren't able to finish a commit before moving on to a different task for some reason, you may have to stash away the changes (perhaps committing them to a short-lived branch) and revert them in order to work on the next task.

 2. One project per repository

Like CVS, Subversion allows you to check out subdirectories of a repository, making it easy to have sub-projects.  It can be tempting to have one big repository for an organizational unit with many independent sub-projects, but we advise against this practice.  If two units of code will not be branched, released, and deployed together, they should typically not live in the same repository.  Even if the two code bases share a dependency such as a library, you can use infrastructure such as Maven to handle the code sharing rather than piling it all into one repository.  Reasons to use separate repositories include:

  • It is easiest to manage access control, commit email settings, and the like if projects are contained within their own repository.
  • If maintenance of a piece of code moves between business units, it is not generally possible to preserve the history of the code if it has to move between repositories.  It is better if the code lives within its own repository and can simply be reconfigured to have different ownership, which is more likely if each project has its own repository.
  • Not all version control tools gracefully support sub-projects.  If in the future you ever decide to convert from Subversion to git or another such tool, you may have an easier time if you have been using one project per repository.
  • Although Subversion scales fairly well to large amounts of code, you will to some degree see better performance if your repositories remain at a manageable size, which si easier to ensure with one project per repository.

Now for the caveats:

  • Separate repositories are separate universes as far as Subversion is concerned.  You cannot move code between two repositories (except via export/import which discards history), and you cannot branch and tag repositories together.
  • Repositories have some amount of administration overhead, and dozens of repositories may take more work to provision and maintain than just one.  This concern may be alleviated with better tools on the server infrastructure in the future.
  • If you should ever reconsider your repository layout, it is easier (though still wizard-level) to split a single repository into multiple pieces than to combine several repositories into one.

3. Trunk, branches, and tags

This is a standard Subversion convention.  Some tools may be better able to visualize your repository if you follow this convention, and other developers will have an easier time understanding your code if you follow it.

4. Avoid third-party material

Particularly in Java projects using Ant, it can be common to check in third-party .class and .jar files which are needed to build the source code for a project.  This practice can greatly expand the size of your repository (with consequences for checkouts) and can make it more difficult to identify the actual project code when browsing the repository.  It can also lead to version skew due to the work required to update third-party dependencies.  It is better to use Maven or a similar tool to manage third-party dependencies if possible.

5. Avoid generated objects

Generally speaking, if file Z is automatically created from files X and Y, don't check it in.  Use your build process to create it when necessary.  Storing generated files in your repository obscures your project's history and invites version skew (where the sources change and the generated object does not, or vice versa).

  • No labels

1 Comment

  1. One project per repository may prove clunky for my team (IDD).  We have dozens of small projects, and relatively few committers.  We don't really need access control that is that fine-grained, and we also need to be able to create new projects fairly rapidly.