diff --git a/docs/internals/data-structures.rst b/docs/internals/data-structures.rst index f337eb18a..fc3ba97ce 100644 --- a/docs/internals/data-structures.rst +++ b/docs/internals/data-structures.rst @@ -77,7 +77,7 @@ don't have a particular meaning (except for the Manifest_). Normally the keys are computed like this:: - key = id = id_hash(unencrypted_data) + key = id = id_hash(plaintext_data) # plain = not encrypted, not compressed, not obfuscated The id_hash function depends on the :ref:`encryption mode `. @@ -98,15 +98,15 @@ followed by a number of log entries. Each log entry consists of (in this order): * crc32 checksum (uint32): - for PUT2: CRC32(size + tag + key + digest) - - for PUT: CRC32(size + tag + key + data) + - for PUT: CRC32(size + tag + key + payload) - for DELETE: CRC32(size + tag + key) - for COMMIT: CRC32(size + tag) * size (uint32) of the entry (including the whole header) * tag (uint8): PUT(0), DELETE(1), COMMIT(2) or PUT2(3) * key (256 bit) - only for PUT/PUT2/DELETE -* data (size - 41 bytes) - only for PUT -* xxh64 digest (64 bit) = XXH64(size + tag + key + data) - only for PUT2 -* data (size - 41 - 8 bytes) - only for PUT2 +* payload (size - 41 bytes) - only for PUT +* xxh64 digest (64 bit) = XXH64(size + tag + key + payload) - only for PUT2 +* payload (size - 41 - 8 bytes) - only for PUT2 PUT2 is new since repository version 2. For new log entries PUT2 is used. PUT is still supported to read version 1 repositories, but not generated any more. @@ -116,7 +116,7 @@ version 2+. Those files are strictly append-only and modified only once. When an object is written to the repository a ``PUT`` entry is written -to the file containing the object id and data. If an object is deleted +to the file containing the object id and payload. If an object is deleted a ``DELETE`` entry is appended with the object id. A ``COMMIT`` tag is written when a repository transaction is @@ -130,13 +130,42 @@ partial/uncommitted transaction. The size of individual segments is limited to 4 GiB, since the offset of entries within segments is stored in a 32-bit unsigned integer in the repository index. -Objects -~~~~~~~ +Objects / Payload structure +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +All data (the manifest, archives, archive item stream chunks and file data +chunks) is compressed, optionally obfuscated and encrypted. This produces some +additional metadata (size and compression information), which is separately +serialized and also encrypted. + +See :ref:`data-encryption` for a graphic outlining the anatomy of the encryption in Borg. +What you see at the bottom there is done twice: once for the data and once for the metadata. + +An object (the payload part of a segment file log entry) must be like: + +- length of encrypted metadata (16bit unsigned int) +- encrypted metadata (incl. encryption header), when decrypted: + + - msgpacked dict with: + + - ctype (compression type 0..255) + - clevel (compression level 0..255) + - csize (overall compressed (and maybe obfuscated) data size) + - psize (only when obfuscated: payload size without the obfuscation trailer) + - size (uncompressed size of the data) +- encrypted data (incl. encryption header), when decrypted: + + - compressed data (with an optional all-zero-bytes obfuscation trailer) + +This new, more complex repo v2 object format was implemented to be able to efficiently +query the metadata without having to read, transfer and decrypt the (usually much bigger) +data part. + +The metadata is encrypted to not disclose potentially sensitive information that could be +used for e.g. fingerprinting attacks. + +The compression `ctype` and `clevel` is explained in :ref:`data-compression`. -All objects (the manifest, archives, archive item streams chunks and file data -chunks) are encrypted and/or compressed. See :ref:`data-encryption` for a -graphic outlining the anatomy of an object in Borg. The `type` for compression -is explained in :ref:`data-compression`. Index, hints and integrity ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -855,7 +884,7 @@ For each borg invocation, a new sessionkey is derived from the borg key material and the 48bit IV starts from 0 again (both ciphers internally add a 32bit counter to our IV, so we'll just count up by 1 per chunk). -The chunk layout is best seen at the bottom of this diagram: +The encryption layout is best seen at the bottom of this diagram: .. figure:: encryption-aead.png :figwidth: 100% @@ -954,14 +983,14 @@ representation of the repository id. Compression ----------- -Borg supports the following compression methods, each identified by a type -byte: +Borg supports the following compression methods, each identified by a ctype value +in the range between 0 and 255 (and augmented by a clevel 0..255 value for the +compression level): - none (no compression, pass through data 1:1), identified by 0x00 - lz4 (low compression, but super fast), identified by 0x01 - zstd (level 1-22 offering a wide range: level 1 is lower compression and high - speed, level 22 is higher compression and lower speed) - since borg 1.1.4, - identified by 0x03 + speed, level 22 is higher compression and lower speed) - identified by 0x03 - zlib (level 0-9, level 0 is no compression [but still adding zlib overhead], level 1 is low, level 9 is high compression), identified by 0x05 - lzma (level 0-9, level 0 is low, level 9 is high compression), identified diff --git a/docs/internals/encryption-aead.odg b/docs/internals/encryption-aead.odg index a28a63b21..6b9153a33 100644 Binary files a/docs/internals/encryption-aead.odg and b/docs/internals/encryption-aead.odg differ diff --git a/docs/internals/encryption-aead.png b/docs/internals/encryption-aead.png index 1bcfbd178..5b062c844 100644 Binary files a/docs/internals/encryption-aead.png and b/docs/internals/encryption-aead.png differ