Backup automated using rdiff-backup
One day your blog, code or pretty much anything may crash, and sadly, your most valuable information could be irredeemably lost ! Consider the consequences if this ever happens (touch wood!). Pictured them? Scary, right? Now, just imagine how relaxed you would have been instead, if only you'd bothered to make a backup.
What is it?
rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup.
rdiff-backup is available in the most important linux distribution. In my case, I'm using an ArchLinux distributions (Manjaro) and the yay package (Yet another Yogurt - An AUR Helper written in Go) to install the tool.
If you use another distribution, this software can also be installed:
apt-get install rdiff-backup
yum install rdiff-backup
Making backups is very easy when you using
rdiff-backup. You may picture this tool as similar to the
cp command. In other words,
rdiff-backup has two arguments:
- source directory.
- target directory.
Both directories can be local or remote disk. For example, if you want to use
rdiff-backup in a local directory you would use the following command:
rdiff-backup source target rdiff-backup my_personal_directory my_personal_directory_backup
In the same way, if any of the directories are in a remote server, you need only to indicate the path using the classic way:
user@server::PATH. The following commands show how either the remote or local servers can be used in both the source and target directories:
rdiff-backup carloscaballero@guybrush::/docker-volumes/ghost /mnt/backup/carloscaballero # from the remote machine called guybrush using the user carloscaballero copy the directory /docker-volumes/ghost to the local directory /mnt/backup/carloscballero rdiff-backup /docker-volumes/ghost carloscaballero@guybrush::/docker-volumes/ghost # from the local machine copy the directory /docker-volumes/ghost to the remote server guybrush using the user carloscaballero into the directory /docker-volumes/ghost rdiff-backup carloscaballero@guybrush::/docker-volumes/ghost luisgarcia@lechuck::/docker-volumes/ghost # from the remote machine called guybrush using the user carloscaballero copy the directory /docker-volumes/ghost to the machine lechuck using the user luisgarcia into the directory /docker-volumes/ghost
When using these commands, the remote machine will probably request the user's password (for the previous commands,
luisgarcia respectively). You can omit this step by configuring an SSH Key-Based Authentication on a Linux Server.
The real power of this tool is truly appreciated when wanting to restore the information. If you list the contents of the directory in which you made your copy, you will see the contents that you'd previously copied, and futhermore, you will find a directory named
rdiff-backup-data. This directory is very important, since it stores the incremental backups of our data.
In this directory, the contents shown consist of the last version of our backup, plus the incremental copies, which are stored in the
Now imagine that I've created a file called
file1.txt which contains a single sentence. A copy is done using
rdiff-backup and, a few minutes after another copy is done. Now, we shown the list of files in our system wich is the following:
|-- prueba | `-- file1.txt `-- prueba-backup |-- file1.txt `-- rdiff-backup-data |-- access_control_lists.2019-01-23T11:47:36Z.snapshot |-- access_control_lists.2019-01-23T11:51:32Z.snapshot |-- access_control_lists.2019-01-23T11:52:24Z.snapshot |-- backup.log |-- chars_to_quote |-- current_mirror.2019-01-23T11:52:24Z.data |-- error_log.2019-01-23T11:47:36Z.data |-- error_log.2019-01-23T11:51:32Z.data |-- error_log.2019-01-23T11:52:24Z.data |-- extended_attributes.2019-01-23T11:47:36Z.snapshot |-- extended_attributes.2019-01-23T11:51:32Z.snapshot |-- extended_attributes.2019-01-23T11:52:24Z.snapshot |-- file_statistics.2019-01-23T11:47:36Z.data.gz |-- file_statistics.2019-01-23T11:51:32Z.data.gz |-- file_statistics.2019-01-23T11:52:24Z.data.gz |-- increments | `-- file1.txt.2019-01-23T11:51:32Z.diff.gz |-- increments.2019-01-23T11:51:32Z.dir |-- mirror_metadata.2019-01-23T11:47:36Z.diff |-- mirror_metadata.2019-01-23T11:51:32Z.diff.gz |-- mirror_metadata.2019-01-23T11:52:24Z.snapshot.gz |-- session_statistics.2019-01-23T11:47:36Z.data |-- session_statistics.2019-01-23T11:51:32Z.data `-- session_statistics.2019-01-23T11:52:24Z.data
You may note that the file
file1.txthas an incremental copy in the
We can restore a copy with the
rdiff-backup comand, or by directly using the
cp command, since the copy is neither compressed, nor has any of its metadata altered. Therefore, the files are in the same state as when they were copied. Although, you may use the
cp command, the
rdiff-backup tool is better to use, due to the data restoration being more flexible.
The use of the command for restoring backups is similar to the one to make the backup, with the added the option of (restore-as-of,
-r) , as well as the timestamp to restore. The timestamp is very flexible, since the acceptible time strings are intervals, like "3D64s"; w3-datetime strings, like "2002-04-26T04:22:01-07:00" (strings like "2002-04-26T04:22:01" are also acceptable - rdiff-backup will use the current time zone); or ordinary dates like 2/4/1997 or 2001-04-23 (various combinations are acceptable, bearing in mind that the month must always precede the day).
For example, the following command restores the copy made on 23 January 2010.
rdiff-backup -r 2010-01-23 /directory_where_is_my_backup /directory_where_restore_my_backup rdiff-backup -r now /directory_where_is_my_backup /directory_where_restore_my_backup # Restore the last backup
Remove old backups
As you already know, the
rdiff-backup command makes an incremental backup, which entails a large amount of space disk being consumed. Therefore, it is highly recommended to remove old backups (as long as you have other, more recent backups, of course).
rdiff-backup tool has the
remove-older-than option, which removes any backups older than that the date used in the argument. A good example is removing any backups older than 1 year:
rdiff-backup --remove-older-than 1Y /directory_where_is_my_backup
Most of the time, we are required to include o exclude files to our backup. The most common options which can be used in the rdiff-backup are:
As well as these, there are plenty more filter options to make our backups, such as:
rdiff-backup --exclude /mnt/backup / /mnt/backup
In this example we exclude /mnt/backup to avoid an infinite loop, even though rdiff-backup can automatically detect simple loops like the one above. This is just an example, in reality it would be important to exclude /proc as well.
Getting information about the backup directory
There may be a time when we need information about the backup (metadata).
rdiff-backup allows us to obtain this information. The most common options for this are the following:
Since they are quite descriptive, it isn't hard to imagine what the goal of each of the different options is. Despite this, I will show several examples applying each of them:
rdiff-backup --list-increments backup_directory/subdirectory # Lists all the files under backup_directory/subdirectory rdiff-backup -l backup_directory/subdirectory # The following command lists all the files under backup_directory/subdirectory which has changed in the last 5 days. rdiff-backup --list-changed-since 5D directory/subdirectory # This command lists all the files that were present in directory/subdirectory 5 days ago. rdiff-backup --list-at-time 5D directory/subdirectory # This command lists all the files that were present in directory/subdirectory 5 days ago. rdiff-backup --compare in-directory user@host::out-directory # compares the current files in out-directory with the files in in-directory, displaying which ones have changed. rdiff-backup --compare-at-time 2W in-directory user@host::out-directory # This command is similar but compares in-directory to out-directory as it was 2 weeks ago.
Using in cron
A good practice is automating the backups in our system. To do this, we may use the cron service.
Prior to using cron, we must remember to make sure that the script used in cron doesn't output anything, otherwise:
- cron will assume there is an error
- if there is any error, you will not be able to see it
The command which we used in our script is the following:
#!/bin/bash . /root/.bashrc rdiff-backup --force --print-statistics --include-globbing-filelist /root/rdiff-backup-configuration/files_backup.txt / email@example.com::/root/backups/carloscaballero.io 2>&1 > /var/log/rdiff-backup.log rdiff-backup --remove-older-than 1Y firstname.lastname@example.org::/root/backups/carloscaballero.io 2>&1 > /var/log/rdiff-backup-remove.log
The content of the
files_backup.txt file is the following:
+ /root/ghost - **
It is important to know that both success and error logs are saved in the same logfile, named
rdiff-backup.log. Another interesting point is that I've used the filter option
include-globbing-filelist which allows the use of a file as argument. This file contains the directories which will be backed up by using the string
- to express that said directory must be either included or excluded. Note that the backups older than 1 year are deleted to perserve disk space.
Finally, edit the cron file using the
crontab -e command.
0 1 * * * sh /root/rdiff-backup-configuration/rdiff-backup.sh
In this post I've explained the
rdiff-backup tool, which allows us to make incremental backups. I've also shown you the script I use to backup my projects, which is executed by cron one time a day.