borgbackup/docs/usage/notes.rst

Additional Notes
----------------

Here are miscellaneous notes about topics that might not be covered in enough detail in the usage section.

.. _chunker-params:

``--chunker-params``
~~~~~~~~~~~~~~~~~~~~

The chunker params influence how input files are cut into pieces (chunks)
which are then considered for deduplication. They also have a big impact on
resource usage (RAM and disk space) as the amount of resources needed is
(also) determined by the total amount of chunks in the repository (see
:ref:`cache-memory-usage` for details).

``--chunker-params=buzhash,10,23,16,4095`` results in a fine-grained deduplication
and creates a big amount of chunks and thus uses a lot of resources to manage
them. This is good for relatively small data volumes and if the machine has a
good amount of free RAM and disk space.

``--chunker-params=buzhash,19,23,21,4095`` (default) results in a coarse-grained
deduplication and creates a much smaller amount of chunks and thus uses less
resources. This is good for relatively big data volumes and if the machine has
a relatively low amount of free RAM and disk space.

``--chunker-params=fixed,4194304`` results in fixed 4 MiB sized block
deduplication and is more efficient than the previous example when used for
block devices (like disks, partitions, LVM LVs) or raw disk image files.

``--chunker-params=fixed,4096,512`` results in fixed 4 KiB sized blocks,
but the first header block will only be 512 B long. This might be useful to
deduplicate files with 1 header + N fixed-size data blocks. Be careful not to
produce too many chunks (such as using a small block size for huge
files).

If you already have made some archives in a repository and you then change
chunker params, this of course impacts deduplication as the chunks will be
cut differently.

In the worst case (all files are big and were touched in between backups), this
will store all content into the repository again.

Usually, it is not that bad though:

- usually most files are not touched, so it will just re-use the old chunks
  it already has in the repo
- files smaller than the (both old and new) minimum chunksize result in only
  one chunk anyway, so the resulting chunks are the same and deduplication will apply

If you switch chunker params to save resources for an existing repo that
already has some backup archives, you will see an increasing effect over time,
when more and more files have been touched and stored again using the bigger
chunksize **and** all references to the smaller older chunks have been removed
(by deleting / pruning archives).

If you want to see an immediate big effect on resource usage, you better start
a new repository when changing chunker params.

For more details, see :ref:`chunker_details`.


``--noatime / --noctime``
~~~~~~~~~~~~~~~~~~~~~~~~~

You can use these ``borg create`` options to not store the respective timestamp
into the archive, in case you do not really need it.

Besides saving a little space for the not archived timestamp, it might also
affect metadata stream deduplication: if only this timestamp changes between
backups and is stored into the metadata stream, the metadata stream chunks
won't deduplicate just because of that.

``--nobsdflags / --noflags``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can use this to not query and store (or not extract and set) flags - in case
you don't need them or if they are broken somehow for your fs.

On Linux, dealing with the flags needs some additional syscalls. Especially when
dealing with lots of small files, this causes a noticeable overhead, so you can
use this option also for speeding up operations.

``--umask``
~~~~~~~~~~~

borg uses a safe default umask of 077 (that means the files borg creates have
only permissions for the owner, but no permissions for group and others) - so there
should rarely be a need to change the default behaviour.

This option only affects the process to which it is given. Thus, when you run
borg in client/server mode and you want to change the behaviour on the server
side, you need to use ``borg serve --umask=XXX ...`` as an SSH forced command
in ``authorized_keys``. The ``--umask`` value given on the client side is
**not** transferred to the server side.

Also, if you choose to use the ``--umask`` option, always be consistent and use
the same umask value so you do not create a mix-up of permissions in a borg
repository or with other files borg creates.

``--read-special``
~~~~~~~~~~~~~~~~~~

The ``--read-special`` option is special - you do not want to use it for normal
full-filesystem backups, but rather after carefully picking some targets for it.

The option ``--read-special`` triggers special treatment for block and char
device files as well as FIFOs. Instead of storing them as such a device (or
FIFO), they will get opened, their content will be read and in the backup
archive they will show up like a regular file.

Symlinks will also get special treatment if (and only if) they point to such
a special file: instead of storing them as a symlink, the target special file
will get processed as described above.

One intended use case of this is backing up the contents of one or multiple
block devices, like e.g. LVM snapshots or inactive LVs or disk partitions.

You need to be careful about what you include when using ``--read-special``,
e.g. if you include ``/dev/zero``, your backup will never terminate.

Restoring such files' content is currently only supported one at a time via
``--stdout`` option (and you have to redirect stdout to wherever it shall go,
maybe directly into an existing device file of your choice or indirectly via
``dd``).

To some extent, mounting a backup archive with the backups of special files
via ``borg mount`` and then loop-mounting the image files from inside the mount
point will work. If you plan to access a lot of data in there, it will likely
scale and perform better if you do not work via the FUSE mount.

Example
+++++++

Imagine you have made some snapshots of logical volumes (LVs) you want to backup.

.. note::

    For some scenarios, this is a good method to get "crash-like" consistency
    (I call it crash-like because it is the same as you would get if you just
    hit the reset button or your machine would abruptly and completely crash).
    This is better than no consistency at all and a good method for some use
    cases, but likely not good enough if you have databases running.

Then you create a backup archive of all these snapshots. The backup process will
see a "frozen" state of the logical volumes, while the processes working in the
original volumes continue changing the data stored there.

You also add the output of ``lvdisplay`` to your backup, so you can see the LV
sizes in case you ever need to recreate and restore them.

After the backup has completed, you remove the snapshots again.

::

    $ # create snapshots here
    $ lvdisplay > lvdisplay.txt
    $ borg create --read-special /path/to/repo::arch lvdisplay.txt /dev/vg0/*-snapshot
    $ # remove snapshots here

Now, let's see how to restore some LVs from such a backup.

::

    $ borg extract /path/to/repo::arch lvdisplay.txt
    $ # create empty LVs with correct sizes here (look into lvdisplay.txt).
    $ # we assume that you created an empty root and home LV and overwrite it now:
    $ borg extract --stdout /path/to/repo::arch dev/vg0/root-snapshot > /dev/vg0/root
    $ borg extract --stdout /path/to/repo::arch dev/vg0/home-snapshot > /dev/vg0/home


.. _separate_compaction:

Separate compaction
~~~~~~~~~~~~~~~~~~~

Borg does not auto-compact the segment files in the repository at commit time
(at the end of each repository-writing command) any more.

This is new since borg 1.2.0 and requires borg >= 1.2.0 on client and server.

This causes a similar behaviour of the repository as if it was in append-only
mode (see below) most of the time (until ``borg compact`` is invoked or an
old client triggers auto-compaction).

This has some notable consequences:

- repository space is not freed immediately when deleting / pruning archives
- commands finish quicker
- repository is more robust and might be easier to recover after damages (as
  it contains data in a more sequential manner, historic manifests, multiple
  commits - until you run ``borg compact``)
- user can choose when to run compaction (it should be done regularly, but not
  necessarily after each single borg command)
- user can choose from where to invoke ``borg compact`` to do the compaction
  (from client or from server, it does not need a key)
- less repo sync data traffic in case you create a copy of your repository by
  using a sync tool (like rsync, rclone, ...)

You can manually run compaction by invoking the ``borg compact`` command.

See :ref:`rollback_transaction` for how to undo changes if you have not run
compaction yet.

.. _append_only_mode:

Append-only mode (forbid compaction)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A repository can be made "append-only", which means that Borg will never
overwrite or delete committed data (append-only refers to the segment files,
but borg will also reject to delete the repository completely).

Please note that this only affects the low level structure of the repository,
and running ``borg delete`` or `borg prune` or reading from the repository will
still be allowed.

If ``borg compact`` command is used on a repo in append-only mode, there
will be no warning or error, but no compaction will happen.

Append-only is useful for scenarios where a backup client machine backs up
remotely to a backup server using ``borg serve``, since a hacked client machine
cannot delete backups on the server permanently.

To activate append-only mode, set ``append_only`` to 1 in the repository config:

::

    borg config /path/to/repo append_only 1

Note that you can go back-and-forth between normal and append-only operation with
``borg config``; it's not a "one way trip."

In append-only mode Borg will create a transaction log in the ``transactions`` file,
where each line is a transaction and a UTC timestamp.

See :ref:`rollback_transaction` for how to use this log to roll back the
repository to an earlier state.

In addition, ``borg serve`` can act as if a repository is in append-only mode with
its option ``--append-only``. This can be very useful for fine-tuning access control
in ``.ssh/authorized_keys``:

::

    command="borg serve --append-only ..." ssh-rsa <key used for not-always-trustworthy backup clients>
    command="borg serve ..." ssh-rsa <key used for backup management>

Running ``borg init`` via a ``borg serve --append-only`` server will *not* create
an append-only repository. Running ``borg init --append-only`` creates an append-only
repository regardless of server settings.

.. _rollback_transaction:

Rolling back a transaction
~~~~~~~~~~~~~~~~~~~~~~~~~~

Borg repositories are transactional. A command either succeeds completely and
commits its changes, or it fails (or is interrupted) and the changes are
not committed.

Furthermore, since Borg 1.2.0, repository space is not freed immediately when
data is marked as deleted (e.g. when archives are deleted or pruned), because
compaction is a separate step (see :ref:`separate_compaction`). This means that
even after a successful commit that deleted data, the old data (and historic
manifests) might still be present in the repository's segment files.

If you accidentally ran a command that caused data loss (like an incorrect
``borg recreate`` or ``borg prune``), or if your repository was compromised
while in append-only mode, you can roll back the repository to a previous state,
**provided that** ``borg compact`` has not been run since then.

Rollback in append-only mode
++++++++++++++++++++++++++++

In append-only mode, the repository contains a transaction log that makes it
easy to identify stable transactions.

Suppose an attacker remotely deleted all backups, but your repository was in append-only
mode. A transaction log (the ``transactions`` file) in this situation might look like this:

::

    transaction 1, UTC time 2016-03-31T15:53:27.383532
    transaction 5, UTC time 2016-03-31T15:53:52.588922
    transaction 11, UTC time 2016-03-31T15:54:23.887256
    transaction 12, UTC time 2016-03-31T15:55:54.022540
    transaction 13, UTC time 2016-03-31T15:55:55.472564

From your security logs you conclude the attacker gained access at 15:54:00 and all
the backups were deleted or replaced by compromised backups. From the log you know
that transactions 11 and later are compromised. Note that the transaction ID is the
name of the *last* file in the transaction. For example, transaction 11 spans files 6
to 11.

In a real attack you'll likely want to keep the compromised repository
intact to analyze what the attacker tried to achieve. It's also a good idea to make this
copy just in case something goes wrong during the recovery. Since recovery is done by
deleting some files, a hard link copy (``cp -al``) is sufficient.

The first step to reset the repository to transaction 5, the last uncompromised transaction,
is to remove the ``hints.N``, ``index.N`` and ``integrity.N`` files in the repository (these
files are always expendable). In this example N is 13.

Then remove or move all segment files from the segment directories in ``data/`` starting
with file 6::

    rm data/**/{6..13}

That's all to do in the repository.

Manual rollback (Undo)
++++++++++++++++++++++

If you are not using append-only mode, you can still roll back a transaction
manually by identifying and removing the latest segment files.

.. note::
    **Make a backup of the repository directory** (e.g. using ``cp -al``)
    before doing any manual modifications! Do NOT run ``borg compact``!

1.  **Identify the segment files.**
    Look into the repository's ``data/`` directory. Segments are numbered and
    grouped into numbered subdirectories.

2.  **Identify the transaction boundaries.**
    Each borg operation that modifies the repository creates a transaction,
    ending with a commit. The last files of a transaction usually are:

    -   **Data segments**: One or more files (up to 500 MB) containing data
        chunks (PUT tags) or deletion markers (DEL tags).
    -   **Manifest**: A small file (a few KB) containing the repository manifest.
    -   **Commit tag**: A tiny file (17 bytes) marking the end of the transaction.

3.  **Find the previous commit.**
    By looking at the segment files in reverse order (highest numbers first),
    you can identify the current transaction's commit and manifest, and then
    the commit of the *previous* transaction (another 17-byte file).

4.  **Remove the files.**
    Delete (or move away) all segment files that were created **after** the
    commit of the transaction you want to roll back to.

After the rollback
++++++++++++++++++

If you want to access this rolled back repository from a client that already has
a cache for this repository, the cache will reflect a newer repository state
than what you actually have in the repository now, after the rollback.

Thus, you need to clear the cache::

    borg delete --cache-only repo

The cache will get rebuilt automatically. Depending on repo size and archive
count, it may take a while.

You also will need to remove ~/.config/borg/security/REPOID/manifest-timestamp.

Drawbacks
+++++++++

As data is only appended, and nothing removed, commands like ``prune`` or ``delete``
won't free disk space, they merely tag data as deleted in a new transaction.

Be aware that as soon as you write to the repo in non-append-only mode (e.g. prune,
delete or create archives from an admin machine), it will remove the deleted objects
permanently (including the ones that were already marked as deleted, but not removed,
in append-only mode). Automated edits to the repository (such as a cron job running
``borg prune``) will render append-only mode moot if data is deleted.

Even if an archive appears to be available, it is possible an attacker could delete
just a few chunks from an archive and silently corrupt its data. While in append-only
mode, this is reversible, but ``borg check`` should be run before a writing/pruning
operation on an append-only repository to catch accidental or malicious corruption::

    # run without append-only mode
    borg check --verify-data repo && borg compact repo

Aside from checking repository & archive integrity you may want to also manually check
backups to ensure their content seems correct.

Further considerations
++++++++++++++++++++++

Append-only mode is not respected by tools other than Borg. ``rm`` still works on the
repository. Make sure that backup client machines only get to access the repository via
``borg serve``.

Ensure that no remote access is possible if the repository is temporarily set to normal mode
(for example, for regular pruning).

Further protections can be implemented, but are outside of Borg's scope. For example,
file system snapshots or wrapping ``borg serve`` to set special permissions or ACLs on
new data files.

SSH batch mode
~~~~~~~~~~~~~~

When running Borg using an automated script, ``ssh`` might still ask for a password,
even if there is an SSH key for the target server. Use this to make scripts more robust::

    export BORG_RSH='ssh -oBatchMode=yes'

.. _adjusting_segment_size:

Adjusting segment size
~~~~~~~~~~~~~~~~~~~~~~

By default, Borg uses a maximum segment file size of 500 MiB. This is a good
balance for many use cases, but you can adjust it to better suit your
environment:

- **Smaller segments (e.g., 50 MiB or 100 MiB)**:
  Recommended if you use tools like ``rsync`` or ``rclone`` to sync your
  repository to another location. Smaller segments result in less data being
  re-transmitted when a segment is updated (e.g., during compaction).
- **Larger segments**:
  Usually not necessary, as 500 MiB is already quite large.

You can change this setting for an existing repository:

::

    # Set maximum segment size to 100 MiB (in bytes)
    borg config /path/to/repo max_segment_size 104857600

Note that changing this setting **only affects new segments** created after the
change. Already existing segments will only be rewritten to the new size when
they are picked up by ``borg compact``.