Now that my backup solution has been up and running for six months, I thought I’d share a short write-up. I would only recommend running your own backup method, if you are aware of the risks involved. If you’re in the market for backup software, there exist many off-the-shelf products that likely offer what you need, without many of the pitfalls that come with setting something up yourself. With that disclaimer out of the way, let’s get started.
Borg backup: data deduplication
The whole process of backing up data revolves around duplication of said data, so why would you want data deduplication in your backup system? To keep the disk space usage of the backup archives in check. Data deduplication avoids storing the same data twice. Say there is a family photo in ten of your backup archives, rather than storing the photo ten times it is much more space efficient to store it only once for all of those archives.
Borg backup combines data deduplication with data chunking, where files are composed out of multiple individual data chunks. A chunk can be shared by more than one file and changes to big files remain limited to the chunks that changed. Actually, borg doesn’t care about file locations or boundaries when processing chunks. Instead it looks at whether a data chunk is new or not (similar to how GIT stores data internally). Only new chunks are transferred and count towards the size of the backup repository.
Borg backup also supports client-side encryption and compression. Encryption frees you from trusting external party, who host your backup archives, to keep your data confidential. Compression further increases disk space efficiency at the expense of CPU cycles (considering backup data is fairly static; this is a good trade off to make). Borg backup is seriously good software that comes with excellent documentation and that’s free and open source!
My backup system
Local backups
I use a push-based setup, where every system initiates its own backups and stores its archives in a remote repository. My remote repository of choice is my NAS, which is accessible over SSH (public key auth only). A candidate remote repository needs the required disk space, needs to be reachable over SSH and needs a user that can execute the borg binary. Alternatively you can use a managed service like borgbase.
Backed up systems include my laptop, desktop (with personal documents), my VPS (hosting this site) and my NAS itself (for my nextcloud and mastodon instances). For handling the client side of things I use borgmatic, which keeps all its configuration in a single yaml file and takes care of automation via systemd. Borgmatic also supports postgresql and mysql database backups. I’ve included some excerpts of my borgmatic config at the end of this post.
A neat feature of borg is that it supports append-only
backup repositories, where the system performing the backup can’t delete or alter archives from the backup repository. This keeps your backups safe in case a compromised system tries to delete all of its prior backups. Note you can temporarily disable append-only
mode to prune the backup repository, this is a manual process however. Detailed instructions are available here.
Offsite backups
As my NAS and machines are in the same disaster area, an offsite backup is wanted should disaster strike. You could just backup to a second remote, but I find the managed borg remotes too costly. Instead I keep a copy of my borg repository in an object storage service (remember repositories are encrypted). Such services are inexpensive for data storage, but transfer fees can be high so they aren’t a good fit if you expect to transfer a lot of data (this is uncommon for backup archives, which typically don’t change a lot after the initial upload).
Most object storage services offer data immutability, which safeguards objects in your buckets against (intentional) deletion or corruption. Personally, I use Backblaze b2 where I store 200 GB for 0.98 EUR per month. As a comparison, at borgbase the cost would be 2.5 EUR per month for storing 200GB.
Practically I copy the back-up repositories periodically to b2, using rclone integration’s with b2. Specifically, I’ve setup one b2 bucket per borg repo and issue the following rclone command for every repo:
rclone sync --fast-list --transfers 10 --progress /home/borg/repos/my-repo b2-europe:my-repo
Should disaster strike, I can retrieve the repository files from b2, rebuild my repository and restore using borg. Note that this requires access to both b2 (including 2FA, if set) and to the key material used to encrypt the borg repository (e.g. repository key file plus pass phrase). So you’ll have to store these safely as they can’t be recovered from the backup repository. I just keep them in my password manager.
Closing remarks
A good practice for backup systems is to make sure that you are able to restore your data. The best method is to actually test restoring your data for different scenario’s: from the local location (e.g. in case of a disk failure) and from the remote location (local locations have been lost). Say your house burned down and all your local machines are lost. Are you able to restore from the remote backup location?
Some improvements remain possible: e.g. monitoring for missing machines (say the backup process silently fails on one of your machines) and automatic reporting. Currently these are still manual processes and therefore prone to being overlooked.
I’m open to any suggestions about my setup. Do you think it can be improved? Please let me know!
Borgmatic config excerpts
Desktop system:
# Where to look for files to backup, and where to store those backups.
# See https://borgbackup.readthedocs.io/en/stable/quickstart.html and
# https://borgbackup.readthedocs.io/en/stable/usage/create.html
# for details.
location:
# List of source directories to backup (required). Globs and
# tildes are expanded.
source_directories:
- /home
- /mnt/data/Documents
- /etc
- /root
# Paths to local or remote repositories (required). Tildes are
# expanded. Multiple repositories are backed up to in
# sequence. See ssh_command for SSH options like identity file
# or port.
repositories:
- borg@nas:~/repos/my-desktop-repo
# Any paths matching these patterns are excluded from backups.
# Globs and tildes are expanded. See the output of "borg help
# patterns" for more details.
exclude_patterns:
- '/root/.cache/'
- '/root/.local/'
- '/home/*/.cache/'
- '/home/*/build/'
- '/home/*/aur/'
- '/home/*/abs/'
- '/home/*/mnt/'
- '/home/*/.local/'
- '/home/*/.mozilla/'
- '/home/fvdnabee/s3/'
- '/home/fvdnabee/Downloads/scratch/'
- '/home/fvdnabee/sshfs'
- '/home/fvdnabee/jhbuild'
- '/home/fvdnabee/.npm/_cacache/'
- '/home/fvdnabee/.config/Slack/'
- '/home/fvdnabee/.ts3client/'
- '/home/fvdnabee/.theano/'
- '/home/fvdnabee/.QMapShack/'
- '/home/fvdnabee/.vim/plugged/'
- '/home/fvdnabee/.wine'
- '/home/fvdnabee/go'
- '/home/fvdnabee/GIT/my-repo/data/'
- '/home/fvdnabee/.imagej'
- '/home/fvdnabee/.config/coc/extensions/'
- '/home/fvdnabee/.config/Popcorn-Time/'
- '/home/fvdnabee/.config/chromium'
- '/home/fvdnabee/.config/browsh'
- '*.mkv'
- '*.torrent'
- '*.CR2'
- '*.pyc'
- '*.dcm'
- '*.npz'
- '*.nii'
- '*.nii.gz'
# Exclude directories that contain a CACHEDIR.TAG file. See
# http://www.brynosaurus.com/cachedir/spec.html for details.
# Defaults to false.
exclude_caches: true
# Exclude directories that contain a file with the given
# filenames. Defaults to not set.
exclude_if_present:
- pyvenv.cfg # exclude virtualenv folders
- .nobackup
# Repository storage options. See
# https://borgbackup.readthedocs.io/en/stable/usage/create.html and
# https://borgbackup.readthedocs.io/en/stable/usage/general.html for
# details.
storage:
# Passphrase to unlock the encryption key with. Only use on
# repositories that were initialized with passphrase/repokey
# encryption. Quote the value if it contains punctuation, so
# it parses correctly. And backslash any quote or backslash
# literals as well. Defaults to not set.
encryption_passphrase: "secret"
# Type of compression to use when creating archives. See
# http://borgbackup.readthedocs.io/en/stable/usage/create.html
# for details. Defaults to "lz4".
compression: auto,zstd,3
# Retention policy for how many backups to keep in each category. See
# https://borgbackup.readthedocs.io/en/stable/usage/prune.html for
# details. At least one of the "keep" options is required for pruning
# to work. To skip pruning entirely, run "borgmatic create" or "check"
# without the "prune" action. See borgmatic documentation for details.
retention:
# Number of daily archives to keep.
keep_daily: 7
# Number of weekly archives to keep.
keep_weekly: 4
# Number of monthly archives to keep.
keep_monthly: 3
# Number of yearly archives to keep.
keep_yearly: 1
# Consistency checks to run after backups. See
# https://borgbackup.readthedocs.io/en/stable/usage/check.html and
# https://borgbackup.readthedocs.io/en/stable/usage/extract.html for
# details.
consistency:
# List of one or more consistency checks to run: "repository",
# "archives", "data", and/or "extract". Defaults to
# "repository" and "archives". Set to "disabled" to disable
# all consistency checks. "repository" checks the consistency
# of the repository, "archives" checks all of the archives,
# "data" verifies the integrity of the data within the
# archives, and "extract" does an extraction dry-run of the
# most recent archive. Note that "data" implies "archives".
checks:
- repository
- archives
Server system:
Running two WordPress website, files under /var/www
and with two MariaDB databases. As borgmatic runs as the root user, it has access to the databases.
# Where to look for files to backup, and where to store those backups.
# See https://borgbackup.readthedocs.io/en/stable/quickstart.html and
# https://borgbackup.readthedocs.io/en/stable/usage/create.html
# for details.
location:
# List of source directories to backup (required). Globs and
# tildes are expanded.
source_directories:
- /root
- /etc
- /var/www
- /var/log/syslog*
# Paths to local or remote repositories (required). Tildes are
# expanded. Multiple repositories are backed up to in
# sequence. See ssh_command for SSH options like identity file
# or port.
repositories:
- borg@nas:~/repos/my-server-repo
# Any paths matching these patterns are excluded from backups.
# Globs and tildes are expanded. See the output of "borg help
# patterns" for more details.
exclude_patterns:
- /root/.cache
# Repository storage options. See
# https://borgbackup.readthedocs.io/en/stable/usage/create.html and
# https://borgbackup.readthedocs.io/en/stable/usage/general.html for
# details.
storage:
# Passphrase to unlock the encryption key with. Only use on
# repositories that were initialized with passphrase/repokey
# encryption. Quote the value if it contains punctuation, so
# it parses correctly. And backslash any quote or backslash
# literals as well. Defaults to not set.
encryption_passphrase: "secret"
# Type of compression to use when creating archives. See
# http://borgbackup.readthedocs.io/en/stable/usage/create.html
# for details. Defaults to "lz4".
compression: auto,zstd,3
# Retention policy for how many backups to keep in each category. See
# https://borgbackup.readthedocs.io/en/stable/usage/prune.html for
# details. At least one of the "keep" options is required for pruning
# to work. To skip pruning entirely, run "borgmatic create" or "check"
# without the "prune" action. See borgmatic documentation for details.
retention:
# Number of daily archives to keep.
keep_daily: 7
# Number of weekly archives to keep.
keep_weekly: 4
# Number of monthly archives to keep.
keep_monthly: 6
# Number of yearly archives to keep.
keep_yearly: 1
# Shell commands, scripts, or integrations to execute at various
# points during a borgmatic run. IMPORTANT: All provided commands and
# scripts are executed with user permissions of borgmatic. Do not
# forget to set secure permissions on this configuration file (chmod
# 0600) as well as on any script called from a hook (chmod 0700) to
# prevent potential shell injection or privilege escalation.
hooks:
# List of one or more MySQL/MariaDB databases to dump before
# creating a backup, run once per configuration file. The
# database dumps are added to your source directories at
# runtime, backed up, and removed afterwards. Requires
# mysqldump/mysql commands (from either MySQL or MariaDB). See
# https://dev.mysql.com/doc/refman/8.0/en/mysqldump.html or
# https://mariadb.com/kb/en/library/mysqldump/ for details.
mysql_databases:
# Database name (required if using this hook). Or
# "all" to dump all databases on the host. Note
# that using this database hook implicitly enables
# both read_special and one_file_system (see
# above) to support dump and restore streaming.
- name: wp_mywordpress
- name: wp_secondwordpress