rsync

rsync

I’ve talked about some of my favorite windows based programs, but the best program EVER is rsync.  I don’t go a work day without using rsync. I use rysnc for two major purposes

1) Deployments: We work off a local dev server for all our projects.  Historically we’ve SFTPd the files up to the live environment once we were done.  This was fine for our first deployment, but doing updates sometimes  required us to track down files deep in the folder structure.  More often than not we’d forget a file or two.  rsync solved all those issues.  Now when we launch the site we just do the following

rsync --dry-run -vaz --exclude-from /var/www/html/website_client/excludesFolder/excludeList.txt /var/www/html/website_client/ myusername@123.123.123.123:/var/www/html/client

Seems simple enough I know, but no one told me about this.  I just had to figure it out on my own.  So I’m helping spread the idea.  It’s not my original idea, so I take no credit (all credit goes to the makers of rsync).  note: Obviously remove the –dry-run to actually MOVE the files.  The excludeList.txt is just a plain text file of files you don’t want to sync up to the production servers. You know, things like the DB connection file and the .htaccess files that have different paths.

2) Backups: I use rsync to do both my local nightly backups as well as my offsite backups.  Due to limited space (or the desire to optimize space) there’s a trick you can use:

/usr/bin/rsync -vaz --partial --timeout=800 --exclude-from /backup/rsync_exclude.txt --progress --bwlimit=50 -e/usr/bin/ssh --del --link-dest=/backup/2010-06-09 root@source.ip.address:/data/ /backup/2010-06-24

What’s going on there is:

  • -vaz = verbose, archive, compress : ie. I log the files being transferred, grab everything recursively and preserve as much as possible, on and compress the transfer to reduce bandwidth usage.
  • –partial : in case something goes wrong in the middle of a large file transfer, this will allow you to pick up where you left off.
  • –exclude-from : discussed above, it says what we don’t want to have transferred
  • –bw-limit : throttle the bandwidth so I can still browse the web without huge lag.  Not required for local rsync of course
  • -e : sets the location of  your ssh
  • –del : so that we get a mirror image we’ll delete files off the backup server if it’s been removed off the origin server
  • –link-dest : this is the most important one – this is a previous copy of the site…so last nights backup.  I’ll explain more below
  • source : where are you getting the files from
  • destination : where on your backup server are you putting the files.

I’ve taken all this and placed it into a backup script to dynamically generate the dates and make it easy for me to do the same task on other servers.

So what this command does is look at the –link-dest path start downloading files that have changed or are new.  The really important part though is that it creates hard links to the files that are the same/haven’t changed.  What this does is allow you to browse through the folder structure of a backup and see every file that exists…even if it wasn’t backed up that day, BUT it doesn’t take up any additional disk space (well, technically it takes up a few bytes for the link itself, but what’s a few bytes among friends).  When you delete the file you’re actually just deleting the hard link, so as long as there’s a hard link somewhere on the file system the file will still be there. Nice eh!?

So, to summarize, rsync is awesome for both deployments and backups.  If I were stranded on a deserted island and could only have one program, rsync would be it.