Peter's Blog

Redefining the Impossible

Items filed under rsync


I'm so taken with my new SliceHost VPS that I've moved this blog over to it. For the first time in four years this blog is hosted on something other than Apache: Nginx. It's still on drupal but running under php5 for the first time (only needing this fix).

The DNS has only just propogated so this is my first posting on the new host. To me it feels snappier when navigating around, a definite improvement.

I've been moving all my sites over to the new host, I've decided to put all my eggs in one basket. Well, not quite, I've also signed up for a minimal rsync.net account, giving me just over 3g of backup space. It will cost me circa £2.50 a month but will be worth it for the peace of mind.

I've put all my sites and also my most important /etc directories (such as /etc/nginx) into a subversion repository. I found a great subversion tip for checking /etc into subversion in the subversion faq:

How can I do an in-place 'import' (i.e. add a tree to Subversion such that the original data becomes a working copy directly)?

Suppose, for example, that you wanted to put some of /etc under version control inside your repository:

  1. svn mkdir file:///root/svn-repository/etc \ -m "Make a directory in the repository to correspond to /etc"
  2. cd /etc
  3. svn checkout file:///root/svn-repository/etc .
  4. svn add apache samba alsa X11
  5. svn commit -m "Initial version of my config files"

This takes advantage of a not-immediately-obvious feature of svn checkout: you can check out a directory from the repository directly into an existing directory. Here, we first make a new empty directory in the repository, and then check it out into /etc, transforming /etc into a working copy. Once that is done, you can use normal svn add commands to select files and subtrees to add to the repository.

There is an issue filed for enhancing svn import to be able to convert the imported tree to a working copy automatically; see issue 1328.

So all the juicy stuff is in subversion and then I rsync the subversion repository over to rsync.net for the backup (nb, don't rsync a live svn repository while anyone else is modifying it!). This will allow me to roll-back changes should need be (or more likely see what I've changed to break something) using the Slicehost subversion repositories.

If anything does go wrong with my Slicehost slice (not saying it will but it might), I can rent a new VPS or dedicated server and have my stuff back up and running in less than a day (DNS propgation to the new server would be the delaying factor). It's not perfect redundancy but I'm not shafted.

Probably a more likely scerario is my server gets pwned in which case I would remaster it.

The rsync copy between the Slicehost and rsync runs at 583850.15 bytes/sec. Pretty good.

Possible improvements:

  • Use Duplicity to backup the repositories, with a full+incremental scheme.
  • Put a web front end on the subversion repositories so I can browse them (although [subclipse|http://subclipse.tigris.org/) is working nicely).

UPDATE: I've noticed this article getting hits for people looking how to copy a subversion repository. The simple answer is to just copy the files: cp -rf, rsync -a, however you like copying things. If the repository is live (i.e people are using it) or it will be running on a different version of subversion then the answer is to use:

svnadmin dump path-to-repository > dump.dat
cp dump.dat {wherever}
cd {wherever}
svnadmin load path-to-new-repository < dump.dat

Filed under: rsync slicehost subversion svn

2 Comments

I had my rsync scripts set up and running. I had backup and get scripts so I could backup my work to a server, go home, download the changes, do some work, upload the changes, download them again at work. Who needs a flash drive?

However there's this thing called 'finger trouble' that happens sometimes: you forget to backup the changes on one machine or another and suddenly, bam, your work vanishes. At least that is what I think happened, all I know is that twice now I have lost some work. Running backup and get scripts to synchronise two pc's against a server is too error prone for me.

So I set up unison which, once running, should be a less error prone solution. It will synchronise the files on my home and work pc's with the files on the server. Changes to the files may be propogated either way, depending on the time stamps. I can change the files on either pc and unison will help me resolve any conflicts I create. While it is synchronising it will prompt me with each file that has changed and ask me what to do with it so I can see if I am about to overwrite an evening's work.

I have the following unison profile file set up in my .unison directories on both pc's:

#
# Unison profile for project X
#
root = /cygdrive/c/projects/X
root = ssh://me@myserver.com//home/pcw/X

ignore = Name Downloads/*
ignore = Name Backup/*
ignore = Name Backups/*
ignore = Name Tmp/*
ignore = Name *.map
ignore = Name *~
ignore = Name *.obj
ignore = Name *.o
ignore = Name *.i

backup = Name *
backupdir = f:\users\pcw\BACKUP\X

If the profile file is called X.prf then I can synchronise this project by running

unison X

Because the profile is in my private .unison directory I don't even need to cd to the project directory.

The 'backup' and 'backupdir' lines at the end of the profile tell unison to maintain an extra local backup, in this case on our main file server such that the backups fall under our company tape backup procedure.

It has worked for a day now with fewer problems than rsync gave (and ironically unison uses rsync in it's underpinnings) and it is fast, taking less than a minute to work on 491 files.

To maintain straight backups unison would be overkill but where changes could be made to the files in more than one place it is looking good.

Of course, I could use unison with a flash drive rather than a file server but I don't have to remember to carry the server around with me.

And I know about synctoy but unison works easily through an ssh tunnel.


Filed under: backup rsync unison


Another backing up setup is under construction. This time I'm backing up my project developed on a windows box to a linux server using rsync. It is using ruby as a scripting language. If this script is invoked thus:

ruby Backup.rb

then this script will merely update a simple backup of the source directory on the server. rsync is very fast and will only spend time uploading files that have actually changed and also it will only upload the changes to those files. I am using this on a 30M source archive and it takes less than a minute to update the backup. The backup files are just copies of the original files so easy to browse, diff, restore.

If the script is invoked thus:

ruby Backup.rb --backup

then it will use rsync's magic link-dest option to create historical archive of backups. What the script will do is this:

  • Move the current main backup to an archive directory and give that archive directory a name that is date/time stamped.
  • Create a new backup. Where files have not changed from the previous backup, the new backup will contain a link to the existing copy of the file in the archive, rather than consume more file space. The new backup directory will only contain files that are new or have been changed.

There is nothing in the script to limit the number of backups but they can be manually pruned every few months as required.

This is the script:

   1  #
   2  # This script is used to backup the source code to an offsite server
   3  #
   4  # If the parameter '--backup' is provided then create a backup directory
   5  #
   6  strMeAtMine = "me@myserver.org"
   7  strLocalFolder = "/cygdrive/c/projects/757/"
   8  strTargetFolder = "/home/pcw/757"
   9  strBackupFolder = "/home/pcw/Backup/757"
  10  
  11  #
  12  # If asked to backup then move previous backup to a backup directory
  13  # given a name related to date/time.
  14  #
  15  if ARGV.index( '--backup')
  16    bBackup = true
  17  
  18    strTimeStamp = Time.now.strftime( '%Y-%m-%d-%H-%M')
  19  
  20    strBackupDir = "#{strBackupFolder}/#{strTimeStamp}"
  21  
  22    strCommand = "ssh #{strMeAtMine} mv #{strTargetFolder} #{strBackupDir}"
  23  
  24    system( strCommand)
  25  else
  26    bBackup = false
  27  end
  28  
  29  #
  30  # Determine location of rsync exluded file list
  31  #
  32  strDir = File.dirname( __FILE__)
  33  strExcludeFile = File.join( strDir, "rsync-exclude.txt")
  34  
  35  #
  36  # Build command to invoke in an array of strings since this allows me to comment what
  37  # each parameter does.
  38  #
  39  strCommand = [ "rsync",
  40  #  	"-n",                                 # -n = dry run
  41    	"-v",                                 # verbose
  42                  "-a",                                 # archiving options
  43                  "--delete",                           # delete files no longer used from target
  44                  "--chmod=u=rwX",                      # set target file permissions
  45                  "--exclude-from=#{strExcludeFile}",   # exclude rubbish
  46                  "--delete-excluded",                  # delete excluded files from target
  47                  "-e ssh",                             # use ssh tunnel
  48                  strLocalFolder,                       # source
  49                  "#{strMeAtMine}:\"#{strTargetFolder}\"" # target
  50              ]
  51  
  52  if bBackup
  53    #
  54    # Backing up so instead of uploading everything again, link to files in the
  55    # backup directory where there are no changes.
  56    #
  57    strCommand.insert( -3, "--link-dest=#{strBackupDir}")
  58  end
  59  
  60  system( strCommand.join(" "))

Since this is a backup of files from a windows box to linux I used the "--chmod=u=rwX" to specify simple file permissions on the linux end. Without this rsync was tending to create files with no access permissions for anyone.

The 'rsync-exclude.txt' file lives in the same directory as the script and the script uses some magic to find it. This is a list of stiff that doesn't need backing up:

/Downloads
*.map
Backup/*
Backups/*
*~

This uses the cygwin version of rsync which uses ssh to talk to the remote server. I have ssh set up with key files so I don't need to enter passwords.


Filed under: backup rsync ruby

4 Comments

I've been plying with unison recently, as a means of replicating my collection of 3000 photos between two computers: I nearly lost all the family photos once before, and I have been neglecting backups. With my new strategy of having the dell D410 fire up in it's docking station every night and doing backups, it can also synchronise the photo collection with the 500m which runs all the time.

The advantage of unison over rsync is that it synchronises a pair of directories, i.e. if you add, edit or delete files on one copy then the other copy will be updated accordingly. This is a bi-directional thing, changes made to the second copy will be replicated in the first. In the case of my photo's, I always use the 500m to copy them from the camera but I am likely to use the D410 to manipulate them and what I want is changes in one archive to be fed into the other.

The invocation of unison that I have come up with so far is:

unison -batch -fastcheck true <path1> <path2>
-batch
tells unison not to ask me stupid questions, it will reconcile problems automatically where it can. -force can be used toi resolve conflicts in favour of one of the paths.
-fastcheck true
tells it to compare files by timestamp and length rather than a full binary compare, as is the default under windows.

I have been using a native windows build of unison rather than the cygwin version, mainly because the windows version handles windows file names transparently, including network file names. cygwin unison via ssh does not seem to be able to find any of the file paths I give it (e.g. /cygdrive/c/blah, the cygwin form of c:\blah) and doesn't like network path names (e.g. \\server\share).

Still, it is not working properly for my photo collection but it may be because the network is not reliable enough to handle 3000 photos and 3G worth of data with absolutely no network problems. This may be another sign that the Wifi on the 500m is iffy (iffy wifi ha!) as the d410 in it's docking station has wired networking available.

Have I mentioned recently that the d410 is really nice? The 500m screen seems huge when I go back to it but the d410 is so fast, small and light.


Filed under: backup rsync unison

2 Comments

Backing up linux ubuntu intranet stuff to central windows server using rsync and smbfs. I did this as follows:

  • Create new windows user with limited priviledges, apart from writing backup files. The intranet server will have permanent access to the windows server and access priviledges should not be too generous.
  • Install smbfs:
    sudo apt-get install smbfs
    
  • Add following to /etc/fstab
    //{server name}/{share name} /mnt/{target} smbfs rw,username={windows username}, \
              password={windows password},uid={linux username},gid={linux username} 0 0
    
    uid and gid are necessary to get write permission.
  • Mount share:
    sudo mount /mnt/{target}
    
  • Create a backup directory on the windows server. Since it is only a backup I made this compressed.
  • Add a cron entry to copy stuff over every night:
    34 3 * * 1-5 rsync -a --delete --exclude="\\.*" \
              /home/pcw/StuffToBackup/ /mnt/{target}/users/PCW/BACKUP
    
    Windows does not like file names that start with a . so I exclude these.

Filed under: backup linux rsync samba ubuntu

1 Comment

Needed to copy some files from someones Windows 2000 box but they were away. How to do it?

  • Boot PC with knoppix
  • mount -t ntfs /dev/hda1 /mnt/hda1
  • CD to desired directory
  • Use rsync to copy files to server

Job done. Passwords? Who cares?

The boot didn't go so smoothly, it seemed to have problems with the sound adapter and X wouldn't boot properly. I got tired of waiting for X so I pressed ALT-F4 to get a login console and did it in text mode. I really should be memorising the boot option for a runlevel without X.

Repeat after me:

knoppix 2

Moral: encrypt your files.


Filed under: knoppix rsync


This is the script I am using to back up my debian dedicated server to my ubuntu desktop. It uses ssh and rsync. It uses the cool rsync link-dest trick so that instead of creating multiple copies of the same file, it creates only one copy of the file with multiple hard links to it. I have my ssh keys set up so I don't need to give a password to log in via ssh.

This uses a 'pull' technique: the desktop reads the files from the server using this script.

This is not entirely efficent in that it will create a new set of backup files even if nothing changes: if you run the script ten times in a row then you will end up with ten identical sets of files. However, it backs up a web site that changes every day so running it once a day is valid.

Next job is to put selected files within the backup set into subversion. I decided against using subversion for everything, I can't see a way to automatically delete files bit I'd like to put the main sql dump into subversion.

   1  #!/bin/bash
   2  
   3  #
   4  # Rotate old backups:
   5  #   $1 = remote directory to backup
   6  #   $2 = local backup directory
   7  #
   8  function rotate {
   9      # Ripple old backups
  10      rm -rf $2/Backup9
  11      mv $2/Backup8 $2/Backup9
  12      mv $2/Backup7 $2/Backup8
  13      mv $2/Backup6 $2/Backup7
  14      mv $2/Backup5 $2/Backup6
  15      mv $2/Backup4 $2/Backup5
  16      mv $2/Backup3 $2/Backup4
  17      mv $2/Backup2 $2/Backup3
  18      mv $2/Backup1 $2/Backup2
  19      mv $2/latest $2/Backup1
  20  
  21      # Copy current version to latest, creating hard links where files have not changed.
  22      #
  23      rsync -avz --delete --exclude=.svn --link-dest=$2/Backup1 -e ssh $1/  $2/latest/
  24  
  25      #
  26      # Put a date stamp in the backup directory.
  27      #
  28      echo >`date +$2/latest/Backup-%Y-%m-%d` "Hello Peter"
  29  }
  30  
  31  rotate sshusername@ssh.server.address:/var/www/petersblog.org /home/peter/Backup/petersblog.org
  32  

Filed under: backup debian rsync ssh ubuntu

2 Comments

I have implemented a daily backup from my oneandone dedicated server to my home desktop pc. I used rsync over ssh using the notes I wrote myself which work with ubuntu as well as windows. It is only backing up the websites, I'm not backing up the configuration yet, I could add a backup of /etc only that has to run as root.The PC is set up to turn itself on everyday via the BIOS, it will run the backup, do any other daily stuff I can think of, then switch itself off.

At the server end, just before the backup schedule, the mysql databases for the websites are all dumped. The sql for this site is now 9M although a lot of that appears to be server logs.

Once the backup has reached the desktop pc I'd like to create some form of historical backup and I am deciding whether to create incremental differences or chuck everything into subversion. I am leaning towards the latter.

One option for the servers daily activities could be to download podcasts and dump them onto my phone...


Filed under: backup mysql rsync ssh ubuntu


I was using rdiff-backup for incremental backups but I'm fed up with it raising meaningless assertion errors (ref rdiff-backup woes,Backup Strategy).

A quick google found this article with a nice simple incremental backup strategy. I've written this bash script to implement this strategy:

   1  #!/bin/bash
   2  # $1 = source dir
   3  # $2 = backup dir
   4  
   5  # Compare directory trees: if no changes then do not backup as this wastes a backup slot
   6  diff -r --brief $1 $2/Backup0 &> /tmp/DailyDiff
   7  
   8  if [ $? == 1 ]; then
   9      # Ripple old backups
  10      rm -rf $2/Backup9
  11      mv $2/Backup8 $2/Backup9
  12      mv $2/Backup7 $2/Backup8
  13      mv $2/Backup6 $2/Backup7
  14      mv $2/Backup5 $2/Backup6
  15      mv $2/Backup4 $2/Backup5
  16      mv $2/Backup3 $2/Backup4
  17      mv $2/Backup2 $2/Backup3
  18      mv $2/Backup1 $2/Backup2
  19      mv $2/Backup0 $2/Backup1
  20      # Copy current version to Backup 0, creating hard links where files have not changed.
  21      rsync -a --delete --link-dest=../Backup1 $1/  $2/Backup0/
  22  else
  23      # Report any diff errors.
  24      if [ $? == 2 ]; then
  25          echo Diff returned an error
  26          cat /tmp/DailyDiff
  27      fi
  28  fi
  29  
  30  rm /tmp/DailyDiff

I created all the Backup? directories by hand to avoid errors, all empty to begin with.

Adding this to crontab:

15  23 * * 1-5 /home/pcw/DailyBackup /home/pcw/Projects /home/pcw/Backup/Projects

gives me incremental backups every weekday using just rsync, a tool that has not let me down so far.


Filed under: google rsync


I decided to put a new backup strategy in place at work. I have my desktop PC running windows and an Ubuntu server. I wanted to back up my day-to-day work under windows to the server. I wanted incremental backup so I have the option to backtrack through file history if necessary.

rsync is a nice utility to copy an set of files from one pc to another and works under windows {via Cygwin) and Linux. It can copy over ssh and hence I can use my ssh keys to avoid having to log into the server or put my password in scripts. However it does not do incremental backups, it just duplicates.

rdiff-backup is a nice backup tool that can do cross-network incremental backups. It uses the rsync protocol so it is very efficient. It is also easy to use, no weird command line switches, just give it the name of the source and target directories. However, support for this on windows is not straightforward and it relies on using a cygwin version of python rather than the standard distribution.

So, a compromise solution, use both. I have set things up so that this is done every night when I go home:

cd c:\Projects
rsync -avz --exclude-from="rsync.cnf" -e ssh ./ pcw@rd-pcw2:Projects/ > backup.log
blat backup.log -to pcw@itl.co.uk

this copies files from my 'Projects' directory to the server. The "rsync.cnf" file is a set of things to exclude from the copy, e.g.:

#
# Doxygen output files
#
- Doxygen/

#
# Anything downloaded
#
- Download/
- lstfiles/
- ofiles/
- *.bak
- *.Bak

#
# Anything generated by py2exe
#
- build/
- dist/

#
# Anything in a folder called Old
#
- Old/

#
# VC build directorys
#
Debug/
Release/
debug/
release/

#
# Miscellaneous.
#
- *.obj
- *.tmp
- *.pyc
- setup/*.exe
- Output/setup.exe

After running this I use blat to email me what happened so I know it succeeded.

On the server I have crontab set up to run rdiff-backup every night after the files have been uploaded:

0 18 * * * rdiff-backup /home/pcw/Projects /home/pcw/Backup

This system gives me two full copies of my project files and incremental backups to boot.

Todo: rdiff-backup to a different disk, giving three copies.


8 Comments