With an ESXi box at home and one in a datacenter, I'm having all kinds of fun toying with interesting computer experiments. This weekends endeavor was to replace (more or less) Dropbox.

There area few key things Dropbox does for me:

  • Backup files to a remote location, securely (more or less)
  • Keep versioned copies of the files I backup (indefinitely)
  • Provide remote access to files

Dropbox does more than this, but this is all I use it for. I decided that this all sounds erily like what a GIT repository can do.

CAVEAT AHOY!

I do not edit my documents in any place but my home (source) machine. Dropbox is more about backup than sync for me. This fact greatly simplifies my workflow such this project doesn't have to deal with "the document was modified on my iPhone and desktop -- auto resolve/raise issue" etc. Continuing on...

Overview

The general idea here is to setup a network share over Samba which I can mount to my Windows7 machine and work with as normal. As I work, the change will be auto committed, and once in a while the whole git repo will get mirrored off to my backup machine.

More Specifically

A GIT repository, one which is mirrored off-site, will provide all primary features of Dropbox: remote backup, versioning, and remote access (with a bit extra work). However, I don't want to think about GIT while I am doing these things -- like Dropbox, I just want a folder that stays backed up and safe.

I use Windows as my primary OS. I do my development, putzing about, and gaming on this machine. I'm slowly pairing it down to a beefy graphics card, ram, and a few SSD's -- leaving my ESXi to hold loads of large HD's and keep a low, but always on, power signature. So, when I want a convenient place to store files, I put together a Samba share on my CentOS6.2 minimal install. Toss it under an LVM so I can add more space later (surprisingly not that hard), and I've got an expandable, safe file storage area. (All my phyical disks on the ESXi sit in some kind of RAID mirror.)

Once I've connected and mounted to that, I have a Dropbox-esque area to save files, work on them, and generally use.

Automatic Versioning & Syncing

Like I said, I don't want to deal with GIT add, commit, branch -- I don't want to see it at all, to be honest, I just want whatever is in the share to be saved. GIT is pretty good about tracking changes, be it renames, moves, edits, deletes, or new files -- you just git add -A && git commit -am 'files changed' and you're good to go. The trick is getting that to trigger.

Thankfully, linux has a nice helper, as part of the inotify-tools package called inotifywait which will wait for various events to occur on your watched folders/files, and then do something. In my case, the "something" is the GIT sequence above.

With that in place watching the share, I now have "if something changes, there's a record" setup. Awesome.

The syncing part is quite simple -- you have a remote git repo, you push to it with git push --mirror remote and you're set. Toss that into a cronjob and kick back.

Take Away Notes

Not everything went perfectly smooth:

  • inotify-tools, for centos6.2, is in EPEL. I did a minimal install, so I didn't have EPEL and needed to add it.
  • Windows 7 can't login as two different users to the same network host (or share, for that matter). Fortunately, Windows' idiocy is also our saving grace: Edit your HOSTS file and add several alias to the same IP address and use those to connect as different users.
  • You can run inotifywait in a loop or in daemon mode. If you run it in a loop, it's possible that while the commands in the loop are run, you'll miss of file modifications because you're not "waiting" anymore. In daemon mode, you never miss anything, but running commands seemed trickier. Personally, I just liked the loop better, and it didn't really come out to matter, ultimately.
  • The loop process ran into the cronjob once or twice, thus I had one process trying to git add/commit stuff while another was trying to git push. Between that and needing inotifywait to be in a constant loop, I may end up dropping the bash script and instead move towards a single script that (attempts) a git add -A && git commit every 5 minutes, and then will follow up with a push on the hourly update. However, doing so would mean losing the incremental updates that inotify can give me right now.
  • The first time you drag & drop several gigbytes of files into the directory you'll causes a series of git commits that are just KILLER. But, that's to be expected -- if you git init && git add -A && git commit -a a 100GB directory you're gonna be in for a long wait no matter what you do.