Backup automated using rdiff-backup

ZExtra

Introduction

One day your blog, code or pretty much anything may crash, and sadly, your most valuable information could be irredeemably lost ! Consider the consequences if this ever happens (touch wood!). Pictured them? Scary, right? Now, just imagine how relaxed you would have been instead, if only you'd bothered to make a backup.

Today I'm going to show you my personal backup method. I use the awesome rdiff-backup tool which combines an incremental backup with a mirror.
You can read more about this tool on the official page.

What is it?
rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup.

Installation

rdiff-backup is available in the most important linux distribution. In my case, I'm using an ArchLinux distributions (Manjaro) and the yay package (Yet another Yogurt - An AUR Helper written in Go) to install the tool.

yay rdiff-backup

If you use another distribution, this software can also be installed:

  • Debian
apt-get install rdiff-backup
  • Fedora/RedHat
yum install rdiff-backup

Using rdiff-backup

Making backups

Making backups is very easy when you using rdiff-backup. You may picture this tool as similar to the cp command. In other words, rdiff-backup has two arguments:

  • source directory.
  • target directory.

Both directories can be local or remote disk. For example, if you want to use rdiff-backup in a local directory you would use the following command:

rdiff-backup source target
rdiff-backup my_personal_directory my_personal_directory_backup

In the same way, if any of the directories are in a remote server, you need only to indicate the path using the classic way: user@server::PATH. The following commands show how either the remote or local servers can be used in both the source and target directories:

rdiff-backup carloscaballero@guybrush::/docker-volumes/ghost /mnt/backup/carloscaballero 

# from the remote machine called guybrush using the user carloscaballero copy the directory /docker-volumes/ghost to the local directory /mnt/backup/carloscballero

rdiff-backup /docker-volumes/ghost carloscaballero@guybrush::/docker-volumes/ghost 

# from the local machine copy the directory /docker-volumes/ghost to the remote server guybrush using the user carloscaballero into the directory /docker-volumes/ghost

rdiff-backup carloscaballero@guybrush::/docker-volumes/ghost luisgarcia@lechuck::/docker-volumes/ghost 

# from the remote machine called guybrush using the user carloscaballero copy the directory /docker-volumes/ghost to the machine lechuck using the user luisgarcia into the directory /docker-volumes/ghost

When using these commands, the remote machine will probably request the user's password (for the previous commands, carloscaballero and luisgarcia respectively). You can omit this step by configuring an SSH Key-Based Authentication on a Linux Server.

The real power of this tool is truly appreciated when wanting to restore the information. If you list the contents of the directory in which you made your copy, you will see the contents that you'd previously copied, and futhermore, you will find a directory named rdiff-backup-data. This directory is very important, since it stores the incremental backups of our data.

In this directory, the contents shown consist of the last version of our backup, plus the incremental copies, which are stored in the rdiff-backup-data/increments directory.

Now imagine that I've created a file called file1.txt which contains a single sentence. A copy is done using rdiff-backup and, a few minutes after another copy is done. Now, we shown the list of files in our system wich is the following:


|-- prueba
|   `-- file1.txt
`-- prueba-backup
    |-- file1.txt
    `-- rdiff-backup-data
        |-- access_control_lists.2019-01-23T11:47:36Z.snapshot
        |-- access_control_lists.2019-01-23T11:51:32Z.snapshot
        |-- access_control_lists.2019-01-23T11:52:24Z.snapshot
        |-- backup.log
        |-- chars_to_quote
        |-- current_mirror.2019-01-23T11:52:24Z.data
        |-- error_log.2019-01-23T11:47:36Z.data
        |-- error_log.2019-01-23T11:51:32Z.data
        |-- error_log.2019-01-23T11:52:24Z.data
        |-- extended_attributes.2019-01-23T11:47:36Z.snapshot
        |-- extended_attributes.2019-01-23T11:51:32Z.snapshot
        |-- extended_attributes.2019-01-23T11:52:24Z.snapshot
        |-- file_statistics.2019-01-23T11:47:36Z.data.gz
        |-- file_statistics.2019-01-23T11:51:32Z.data.gz
        |-- file_statistics.2019-01-23T11:52:24Z.data.gz
        |-- increments
        |   `-- file1.txt.2019-01-23T11:51:32Z.diff.gz
        |-- increments.2019-01-23T11:51:32Z.dir
        |-- mirror_metadata.2019-01-23T11:47:36Z.diff
        |-- mirror_metadata.2019-01-23T11:51:32Z.diff.gz
        |-- mirror_metadata.2019-01-23T11:52:24Z.snapshot.gz
        |-- session_statistics.2019-01-23T11:47:36Z.data
        |-- session_statistics.2019-01-23T11:51:32Z.data
        `-- session_statistics.2019-01-23T11:52:24Z.data

You may note that the file file1.txthas an incremental copy in the increments directory.

Restoring backups

We can restore a copy with the rdiff-backup comand, or by directly using the cp command, since the copy is neither compressed, nor has any of its metadata altered. Therefore, the files are in the same state as when they were copied. Although, you may use the cp command, the rdiff-backup tool is better to use, due to the data restoration being more flexible.

The use of the command for restoring backups is similar to the one to make the backup, with the added the option of (restore-as-of, -r) , as well as the timestamp to restore. The timestamp is very flexible, since the acceptible time strings are intervals, like "3D64s"; w3-datetime strings, like "2002-04-26T04:22:01-07:00" (strings like "2002-04-26T04:22:01" are also acceptable - rdiff-backup will use the current time zone); or ordinary dates like 2/4/1997 or 2001-04-23 (various combinations are acceptable, bearing in mind that the month must always precede the day).

For example, the following command restores the copy made on 23 January 2010.

rdiff-backup -r 2010-01-23 /directory_where_is_my_backup /directory_where_restore_my_backup

rdiff-backup -r now /directory_where_is_my_backup /directory_where_restore_my_backup # Restore the last backup

Remove old backups

As you already know, the rdiff-backup command makes an incremental backup, which entails a large amount of space disk being consumed. Therefore, it is highly recommended to remove old backups (as long as you have other, more recent backups, of course).

The rdiff-backup tool has the remove-older-than option, which removes any backups older than that the date used in the argument. A good example is removing any backups older than 1 year:

rdiff-backup --remove-older-than 1Y /directory_where_is_my_backup

Filter Options

Most of the time, we are required to include o exclude files to our backup. The most common options which can be used in the rdiff-backup are:

**- include.

  • include-file-list
  • exclude.
  • exclude-file-list**

As well as these, there are plenty more filter options to make our backups, such as:

rdiff-backup --exclude /mnt/backup / /mnt/backup

In this example we exclude /mnt/backup to avoid an infinite loop, even though rdiff-backup can automatically detect simple loops like the one above. This is just an example, in reality it would be important to exclude /proc as well.

Getting information about the backup directory

There may be a time when we need information about the backup (metadata). rdiff-backup allows us to obtain this information. The most common options for this are the following:

  • list-increments
  • list-changed-since
  • list-at-time
  • compare
  • compare-at-time

Since they are quite descriptive, it isn't hard to imagine what the goal of each of the different options is. Despite this, I will show several examples applying each of them:

rdiff-backup --list-increments backup_directory/subdirectory # Lists all the files under backup_directory/subdirectory

rdiff-backup -l backup_directory/subdirectory # The following command lists all the files under backup_directory/subdirectory which has changed in the last 5 days.

rdiff-backup --list-changed-since 5D directory/subdirectory # This command lists all the files that were present in directory/subdirectory 5 days ago. 

rdiff-backup --list-at-time 5D directory/subdirectory # This command lists all the files that were present in directory/subdirectory 5 days ago. 

rdiff-backup --compare in-directory user@host::out-directory # compares the current files in out-directory with the files in in-directory, displaying which ones have changed. 

rdiff-backup --compare-at-time 2W in-directory user@host::out-directory # This command is similar but compares in-directory to out-directory as it was 2 weeks ago.

Using in cron

A good practice is automating the backups in our system. To do this, we may use the cron service.

Prior to using cron, we must remember to make sure that the script used in cron doesn't output anything, otherwise:

  • cron will assume there is an error
  • if there is any error, you will not be able to see it

The command which we used in our script is the following:

#!/bin/bash
. /root/.bashrc
rdiff-backup --force --print-statistics --include-globbing-filelist /root/rdiff-backup-configuration/files_backup.txt  / root@brix.qontu.com::/root/backups/carloscaballero.io 2>&1 > /var/log/rdiff-backup.log
rdiff-backup --remove-older-than 1Y root@brix.qontu.com::/root/backups/carloscaballero.io 2>&1  > /var/log/rdiff-backup-remove.log

The content of the files_backup.txt file is the following:

+ /root/ghost
- **

It is important to know that both success and error logs are saved in the same logfile, named rdiff-backup.log. Another interesting point is that I've used the filter option include-globbing-filelist which allows the use of a file as argument. This file contains the directories which will be backed up by using the string + or - to express that said directory must be either included or excluded. Note that the backups older than 1 year are deleted to perserve disk space.

Finally, edit the cron file using the crontab -e command.

0 1 * * * sh /root/rdiff-backup-configuration/rdiff-backup.sh

Conclusions

In this post I've explained the rdiff-backup tool, which allows us to make incremental backups. I've also shown you the script I use to backup my projects, which is executed by cron one time a day.