Dirvish enhancements

Dirvish is a rsync based backup tool. It's is intended to backup on disk drives or whatever looks like one under your OS. I came across it while reading an article in c't. It was mentioned together with rsnapshot but said to be a more amplified solution for backups, etc...

I'm using dirvish now to backup computer1 on computer2 and vice versa. I basically want to protect myself from head crashes, but I also keep several copies of data as they use little disk space thanks to the hard link.

Unfortunately dirvish is a "pull" solution. It fetches the files from a backup client (or from itself) and stores them in a dedicated directory. This is okay for backup of computer2 on computer1, but raises two problems for the inverse direction (computer1 gets saved on comptuer2 or, which is the same, computer2 draws the files from computer1) concerning:

time (computer1 does not run all day)
access (computer2 cant automatically login on computer1, as it is more exposed and has less rights)

For a better understanding: computer1 is a ADSL-connected home computer and computer2 is a server on the Internet.

Consequently, it would be nice to have dirvish pushing backups, too. Dirvish can save files local-to-local and remote-to-local. What my patch does is to hide the local-to-remote case in the first one (local-to-local). This requires two steps:

to have a local representation of the remote bank (e.g. sshfs mounted), but
to also hint rsync to do real local-to-remote transfers

The first step could be used alone to let dirvish handle it like "local-to-local", but in the rsync phase of dirvish it would basically copy over the network instead of transfering only checksums. And this is what I chose dirvish for - the benefits of rsync, so I don't want to lose them.

My patch:

calls rsync with a relative --link-dest path instead of an absolute one
adapts the 'bank:' option to enable the naming of two points to access the remote bank (a local and the remote representation). The formal definition changes to:
bank: path [remote-path] (L)
If there is no 'remote-path' see original. Otherwise, 'path' and 'remote-path' cannot contain whitespace and 'remote-path' represents an rsync-like remote target leading to the same path as the local 'path'.

How to use local-to-remote copy?

Installation: download the patch and execute "patch /path/to/dirvish /path/to/dirvish-remote.patch"
Have the remote bank mounted somewhere local.
Add a "bank: local remote" option to your config, e.g.:
```
bank: /usr/backup/remote.bank backup.server.com:remote.bank
```
being "/usr/backup/remote.bank" the local mount point of the bank and "backup.server.com:remote.bank" the rsync-like remote host representation. Copying a file using the very same remote string with rsync (or scp) should work flawlessly. Adapt your ssh configuration to get rid of any password questions, etc.
Anything copied there should turn up at the local representation as expected.
Do the rest as usual, i.e. create the corresponding vaults and config files in the bank, if you use this kind of setup.

Useful tips:

Use the 'speed-limit:' config option if you are behind an unshaped DSL-line, otherwise the upload will blow your link. I thought the link disconnected while dirvish running, until I found the source of the problem.
I mount the local representation with sshfs. You need FUSE for this. Unfortunately, it's dead slow on my setup. If someone can give me a hint on how to solve this, my email is a the bottom of this page. Otherwise, it's a fine solution. I have an fstab entry for easy mounting:
```
/etc/fstab:
sshfs#my.server.com:bank /usr/backup/remote.bank fuse reconnect 0 0
```
As you can see, I use the "reconnect" option. That's handy. Please note that you also need the /sbin/mount.fuse script for that syntax to work (comes with FUSE since version 2.4.0).

Download the patch here

What does it do?

Change the $reftree variable to be relative to $destree (for rsync --link-dest), being especially necessary when writing remote and doesn't hurt if not.
Let 'bank:' still work with directory names that have unquoted spaces. In case of failure, it will attempt to split the argument to see if we have a local/remote bank option.
The only other place where there is a space separated option is 'tree: path [alias]'. It allows backslash-escaped spaces in the 'path' part. I find this very inconsistent and didn't copy this behaviour. Until there is a general consensus on what the syntax should look like (and possibly a general parser for them) spaces are not allowed in 'local' and 'remote' if both are in use.
Allow different ways to express the remote target: with or without trailing slash, totally omitting the path, or even an rsync:: server.
Use the "remote" bank option for the rsync call.
Move doubled code (creation of $$Options{bank} variable) to a subroutine to avoid dubbing my lines, too

Saving file ownership information as non-root

I came across this problem while setting up dirvish to make my backups. Dirvish uses rsync and works heavily with hardlinks to make backups on normal filesystems. I like this solution, but a non-privileged user can't chown() files.

As a result, dirvish (and many other backup solutions as well) need to be run as root to make perfect-looking copies. This is not my preferred solution, as I don't want to have automated network root access open on my machines.

The ideas behind any better solution is to wrap the filesystem accesses. While normal stuff gets through to work on the real filesystem which forms the back-end. Actions which need special rights get filtered and their results stored in file(s) apart. Special actions are: changing ownership information and creating device nodes.

The only two implementations I know of are: fakeroot and pretendroot. Both work as library wrapper. They use the glibc library preload (LD_PRELOAD) mechanism to catch library calls.

I thought about implementing a FUSE filesystem (user space fs) to do similar stuff before I came across fakeroot.

The difference between fakeroot and pretend root is that fakeroot tries harder to give a root-like environment, but its permament storage interface is broken. Fakeroot stores the additional information in memory. A daemon does this, which can load and save to a file on startup and termination. Unfortunately pending race condition bugs make it easy to lose information. System crashes on a lengthy backup do the same. You can trigger that bug so easy:

$ fakeroot -s new_save_file mknod node c 1 2
$ ls -la new_save_file
-rw-r--r-- 1 siemer siemer 0 2006-08-17 14:06 new_save_file

As you can see, no information got stored at all. If you load "new_save_file" in another fakeroot environment, "node" will look like a normal file.

pretendroot on the other hand stores ownership information to files in a directory on the go. No complicated loading, saving cycle. It's disadvantage is to be less spread and having only support for ownership information. You cant create device files.

Both programs associate files to the additional information by inode and filesystem device number. I would like to have a filename based solution in an extra file in every directory where needed. Like umsdos did, as far as I remember... (I used it for 10 minutes some years ago.)

Robert.Siemer-klammero(o)backsla.sh
Tuesday, 29-May-2007 12:58:24 CEST