Length-leaking cipher suites

The Rupture attack and related compression side-channel attacks are based on an assumption of length leaks. In this post, we give an overview of what length-leaking encryption functions are.

In the setting of TLS, which is what Rupture attacks, symmetric cryptography is used to encrypt data in the air. While the symmetric key negotiation is performed using public-key cryptography, for example Diffie-Hellman, the actual payloads such as the HTML code, Javascript code, cookies, and CSRF tokens are encrypted using symmetric ciphers employing the secret session key exchanged earlier.

All practical symmetric ciphers are length-leaking. There are two classes of symmetric ciphers that have historically been used in TLS: Block ciphers and stream ciphers. The original BREACH attack is an attack against stream ciphers only, but Rupture notably also works against block ciphers.

Stream ciphers take a plaintext input of size n and output a ciphertext of size n. A popular stream cipher is RC4, but since a couple of years its use has been discontinued due to security issues. The original BREACH attack worked against RC4. It is clear that stream ciphers can encrypt any length of plaintext.

Block ciphers on the other hand take a fixed-size input and produce a fixed-size output. The popularly used block cipher in TLS is AES, which takes a 128-bit input and produces a 128-bit output. To encrypt text that is less than 128-bits, there is some padding introduced so that the plaintext becomes exactly 128-bits long. If the plaintext is more than 128-bits, it is split up into 128-bit segments, the last of which is padded. The way block ciphers are used to encrypt plaintext that has been split up is called a mode of operation.

If we use E to denote the cipher and m to denote the plaintext, then c = E(m) is the ciphertext. When we say that a cipher leaks the length of the plaintext, we mean that the length of the ciphertext is approximately equal to the length of the plaintext, i.e. |c| ≈ |m|. If we take |m| to mean the number of bits in the text, then more concretely, we have for stream ciphers:

8⌈|c| / 8⌉ = 8⌈|m| / 8⌉

This means that the exact number of bytes in the plaintext equals the number of bytes in the ciphertext.

And for block ciphers we have:

128⌈|c| / 128⌉ = 128⌈|m + 1| / 128⌉

This means that the exact number of blocks in the plaintext equals the number of blocks in the ciphertext, with the addition of one extraneous byte. This last fact is true because TLS requires at least one byte of padding so as to be able to encode that no padding has been added.

Due to the above facts, an adversary who can see the length of a ciphertext can readily deduce the length of the respective plaintext, except for a quantization factor of 8 or 128 respectively. The larger the quantization factor, the less resolution the attacker has regarding plaintext length information. Rupture is able to attack ciphers with even large quantization factors, yet a larger factor can make the attack slower. While the quantization factor can be made arbitrarily large when designing ciphers, the penalty being paid is in bandwidth consumed to transmit the ciphertext.

Typically, when theoretically analyzing the security of a symmetric cipher, length leaking is assumed to occur and is largely ignored in the literature. It is simply something we have to accept as a fact of life. For example, consider the chosen-plaintext attack cryptographic games, and notice how it is mentioned that "The adversary outputs a pair of messages m0, m1 of the same length" – this is exactly length leak at play here. The CPA security definition does not protect against length leaks!

This length leak is the major ingredient in establishing compression side-channel attacks. It is suspected in the theoretical literature of cryptography that ciphers of unbounded input length must necessarily leak some information about the plaintext length.