August 28th, 2006

I don’t trust computers to not suddenly explode and destroy everything I’ve ever created, and I’ve discovered what I believe to be the easiest and most foolproof way to backup my code, or other things for that matter. Email it to yourself automatically.

I use Subversion to manage my personal code library on a remote server, and since my goal is to always have everything important exist in two separate places, I have a compressed dump of the entire repository emailed to myself weekly. I delete the older versions, but since my email account has plenty of free space it’s not very urgent. This all seems straightforward, but I had to do a lot of research to find the right program to use, which I am now sharing with you. All you need to do is set up a cron job like this…

0 6 * * 0 (svnadmin dump ~/svn/repo/ | gzip --best > ~/svn/repo-backup.svndump.gz) 2>&1 | mutt -s 'repo-backup.svndump.gz '$(date +%m/%d/%Y) -a ~/svn/repo-backup.svndump.gz

Obviously, replace the path to the repository and the email address to correct values, and put the backup file somewhere too. This particular job is scheduled to run at 6am on Sunday morning. The repository (~/svn/repo) is dumped and compressed (using gzip, at the highest level of compression) to the file "~/svn/repo-backup.svndump.gz". I use mutt to send the email with the file attached. The subject is the name of the file and the date the email was sent (to avoid grouping backup emails together). I also want messages that svnadmin outputs (which contain information on what was dumped, and possible error messages) to be the body of the email, but svnadmin sends the messages through stderr while mutt receives the body of the email through stdout. To fix this stderr is redirected as stdout after stdout has been used to compress the dumpfile. Isn’t Unix great?

Anyway, if you use Gmail, besides a lot of space to manage these backups, you can also make the process easier using plus-addressing. Let’s say your email is "", then set the email address in the example to "". Set up a label called "Backups" in Gmail and a filter which automatically labels emails as "Backups" when they have been sent to "". You can keep them out of your Inbox if you archive them with the filter as well.

Some other important things…

  • Keep in mind there are size limits for attachments in emails, but this isn’t much of a problem as long as your repository stays fairly small (and full of text files).
  • You don’t need to send them to one address, check the manual page for mutt for more information.
  • Using bzip2 rather than gzip would be a better idea, as bzip2 has recovery features if something goes wrong, but currently Gmail has a major bug when viewing emails with bzip2-compressed files attached. These emails are improperly handled and won’t open through Gmail.
  • I prefer to use a svnadmin dump since I won’t be making any commits at exactly that moment, but more active repositories should probably dump a copy of the repository using svnadmin hotcopy instead as a safety precaution.

2 Responses to “Backup”

  1. Samuel Says:

    Very nice info. Thanks!

  2. Jonathan Says:

    Ollie Glad the guide is helpful!If you’re wriokng with other people, who are using Git exclusively, then you’ve got a whole host of other worries to deal with. From my understanding, when you rebase against the svn trunk, git-svn renames each commit to be in accordance with svn’s linear commit numbering. That means that when you commit a change with Git hash 12345, and then svn dcommit (or git rebase svn/trunk), the commit ID is changed to something else say 54321. If you push that change back up to Github (at least, if you force push it), you are effectively replacing your pure Git history. This is a problem for collaborators who are getting their information through Git, because commit 12345 (which they had already pulled) is in conflict with (because it is identical to) 54321.You can get around some of this by omitting the rebase flag from your git pull. Git merge will detect your conflicts and will usually do a pretty good job of wriokng around them automatically. However, you may find that there is still some funny business you might end up, for instance, with two versions of every commit (the Git version and the SVN version) in your commit logs.For your setup, you might consider using SVN in a slightly dumber way, where at release time you copy over your files from Git into a fresh svn checkout of your trunk, svn add any new stuff, and then just commit the changes. That’ll mean that your svn commit history will not be in sync with your Git history, but at least it means that you won’t have any problems with being out of sync with your collaborators. Let me know what you figure out I’m still learning as I go along with this stuff!

Leave a Reply