Measuring Diffusion in Stream Ciphers using Statistical Testing Methods

Confusion and diffusion suggested by Claude Shannon are two techniques that symmetric key ciphers should satisfy to prevent cryptanalysis. Diffusion dissipates the statistical properties of the plaintext over the whole ciphertext. For a block cipher, each bit of the output ciphertext block changes with probability one half for any flipped bit in the input plaintext block, implying the cipher to have good diffusion properties. This definition with slight modification can also be applied to stream ciphers but here it is enough to make sure the following: (i) to ensure an overall change in the output keystream with probability half for any flipped bit in the key-IV bit sequence, and (ii) to verify that every bit in the output keystream changes with probability one half for any single bit flip in the key-IV bit sequence. Here we insist on using these tests together for measuring diffusion in stream ciphers. Based on this we have examined the level of diffusion exhibited by some of the eSTREAM candidates and the result is given in this paper.

The demand on information security has increased extensively due to the sensitivity of information being exchanged over the public communication channels. Unlike block ciphers, stream ciphers have no standard model for their design and analysis, which leads to cryptographers constructing various models for stream ciphers. From the security perspective, several stream ciphers are found to be vulnerable to cryptanalysis, many of which are statistical analysis1.
In cryptography, confusion and diffusion are two properties of the operation of a secure cipher which were identified by Claude Shannon2. In a block cipher with good diffusion, if one bit of the plaintext is changed, then the ciphertext changes in an unpredictable manner. Cryptographic diffusion test is a kind of statistical test that evaluates a stream cipher or a block cipher for diffusion. The strict avalanche criterion (SAC) builds on the concepts of completeness and avalanche and was introduced by Webster and Tavares8. A statistical test for randomness called the SAC test for pseudorandom number generators was presented by Castro3, et al. Turan4 introduced some structural tests that consider the relation between key-IV and keystream and one of the tests evaluates the diffusion property in stream ciphers.

1.1   Strict Avalanche Criterion of S-boxes

A substitution box (S-box) is an important component used in different cryptographic primitives like stream ciphers, block ciphers, and cryptographic hash functions which provides high nonlinearity if used properly. An m x n S-box5,8 is a mapping of $f:{Z}_{2}^{n}\to {Z}_{2}^{m}$ that maps an n bit to m bit binary sequence.

The test for checking diffusion in stream ciphers is a mere generalisation of the definition of avalanche criterion on S-boxes8. Now we shall brief avalanche criterion on S-boxes and then relate it with diffusion property in stream ciphers. The avalanche criterion on S-boxes can be measured by the following: avalanche effect, strict avalanche criterion-r (SAC-r) and strict avalanche criterion-c (SAC-c), and completeness.

Criterion 1: Avalanche Effect

An n x m S-box, say $f:{Z}_{2}^{n}\to {Z}_{2}^{m}$ exhibits the avalanche effect if and only if

$\sum _{0\le j\le {2}^{n}-1}\text{\hspace{0.17em}}{W}_{j}=m\text{\hspace{0.17em}}{2}^{n-1}$

where ${W}_{j}=wt\left(f\left({x}_{j}\right)\oplus f\left({x}_{j}\oplus {e}_{i}^{n}\right)\right)$ for each i (‘wt’ represents the Hamming weight), ${x}_{j}\in {Z}_{2}^{n}$ and ${e}_{i}^{n}=\left(0,\text{\hspace{0.17em}}0,...,\text{\hspace{0.17em}}0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}1,\text{\hspace{0.17em}}0,...,\text{\hspace{0.17em}}0\right)$ is the 1 bit error vector of n bit length with 1 occupying the ith bit position ( $1\le i\le n$ ). This means that whenever a single input bit is complemented then on an average one half of the output bits changes (irrespective of the bit positions).
Criterion 2: Strict Avalanche Criterion-r

An n x m S-box exhibits the strict avalanche criterion-r (SAC-r) if and only if

$\sum _{0\le j\le {2}^{n}-1}\text{\hspace{0.17em}}{W}_{j}=m\text{\hspace{0.17em}}{2}^{n-1}$

where ${W}_{j}=wt\left(f\left({x}_{j}\right)\oplus f\left({x}_{j}\oplus {e}_{i}^{n}\right)\right)=\frac{m}{2}$ for each j.

This indicates that whenever a single bit in the input to the S-box is complemented then exactly half of the output bits changes. Here, SAC-r implies avalanche effect but the converse need not be true.

Criterion 3: Strict Avalanche Criterion-c

An n x m S-box exhibits the strict avalanche criterion-r if and only if

$\sum _{x\in {Z}_{2}^{n}}\text{\hspace{0.17em}}f\left(x\right)\oplus f\left(x\oplus {e}_{i}^{n}\right)=\left({2}^{n-1},\text{\hspace{0.17em}}{2}^{n-1},...,\text{\hspace{0.17em}}{2}^{n-1}\right)$ for all i, $1\le i\le n$ ( $\Sigma$ is integer addition).

This indicates that whenever a single bit in the input to the S-box is complemented then every bit in the output changes with probability of one half. Here SAC-c implies avalanche effect and the converse need not. It is to be noted that SAC-r may not imply SAC-c and vice versa.

Criterion 4: Completeness

An n x m S-box, exhibits the completeness property if and only if

$\sum _{x\in {Z}_{2}^{n}}\text{\hspace{0.17em}}f\left(x\right)\oplus f\left(x\oplus {e}_{i}^{n}\right)=\left({a}_{1},\text{\hspace{0.17em}}{a}_{2},...,\text{\hspace{0.17em}}{a}_{m}\right)$

with ai > 0 for all i, $1\le i\le m$ .

SAC-c implies completeness property but the converse need not be true. Hence we can conclude that it is enough to use SAC-r and SAC-c as the criterion for examining diffusion in a cipher.

1.2  Interpretation of SAC-r and SAC-c using Matrices

For an  n x m S-box, the truth table is a matrix M of order 2n $×$ m, where each row is an m bit output f(x)of the S-box.Define weight matrix Mwof the same order corresponding to an error vector ei where each row is $f\left(x\right)\oplus f\left(x\oplus {e}_{i}^{n}\right)$ . Hence for every n x m S-box there are a total of n weight matricesobtained for each error vectors ${e}_{i}^{n}$ ($1\le i\le n$ ). These n weight matrices are enough to measure SAC-r and SAC-c.
The S-box is to satisfy SAC-r, if the Hamming weight of each row in all the n weight matrices equals $m/2$ . Unlike SAC-r, if the Hamming weight of each columns for all n weight matrices equals 2n-1, then the S-box is to satisfy SAC-c.

Unlike block ciphers, stream ciphers are said to have good diffusion property if for any single bit flip in the key-IV bit sequence (concatenated bit sequence of key and IV in order) of the cipher corresponding to any fixed key and IV, the keystream changes with probability one half. So, it is enough to verify SAC-r and SAC-c properties for any stream cipher by considering the mapping of the key-IV bit sequence (of size n) to the corresponding output keystream (of length m). Preferably for better results the value of m should atleast n, and for simplicity one can choose m = n. Typically, the size of key-IV bit sequence for stream ciphers is atleast double the size of key, which is 80 bits for hardware oriented stream ciphers10. Testing SAC-r and SAC-c for stream ciphers by considering it as an S-box of atleast 160 bits is infeasible, since a weight matrix of order ${2}^{160}×m$ (where, m is atleast 160) is to be analysed.  But one can test diffusion on stream ciphers by choosing a sample (of size in between ${2}^{10}$ and ${2}^{20}$ )
of inputs from a total of 2n inputs which are the possible distinct key-IV bit sequences for a stream cipher. For some given input samples, the pseudocodes for measuring the level of diffusion of each bit of the key-IV bit sequence on the keystream of the cipher using SAC-r and SAC-c are given as:

2.1  SAC-r Diffusion Test

Consider the n bit length vector ei = (0,0,…..0,1,0,0….,0), where 1 occupies the ith bit position of the vector. Choose a random key-IV bit sequence (k1,k2,…..,kt,kt+1,…..,kn), where the first t bits represents the key and the remaining bits represents the IV of the stream cipher and then generate an L bit length keystream. XOR the error vector e1 with (k1,k2,…..,kt,kt+1,…..,kn) and use the resultant vector as an input to the stream cipher to generate a new L bit keystream, which is then XORed with the previously generated L bit keystream. Then store the resultant bit stream as a row of a matrix. Repeat this process for N different key-IV bit sequences. The matrix obtained is the weight matrix ${M}_{1}^{w}$ of order N X L. The Hamming weight of each L bit length row of the weight matrix is calculated. These N values follows binomial distribution B(L, ½). These values are grouped into five categories as shown in Table 1. The corresponding frequency values follows multinomial distribution with parameters N and pi (for i=1,2,...,5), where pi is the probability of an observed value to lie in the ith category. Here the multinomial experiment generalises a binomial experiment by allowing each trial to result in one of the five possible categories. The expected number of trials resulting in category i is Npi. Chi-Square goodness of fit test for binomial distribution with degree of freedom four is applied for analysing the data.

Null hypothesis:

${H}_{0}:{p}_{1}={p}_{10},\text{\hspace{0.17em}}\text{\hspace{0.17em}}{p}_{2}={p}_{20},\text{\hspace{0.17em}}{p}_{3}={p}_{30},\text{\hspace{0.17em}}{p}_{4}={p}_{40},\text{\hspace{0.17em}}{p}_{5}={p}_{50}$

Alternative hypothesis:
Ha: at least one pi does not equal pi0
The value of pi0 depends on the length of the 5 categories defined.
Test statistic value: ${\chi }^{2}=\sum _{i=1}^{5}\frac{{\left({O}_{i}-{E}_{i}\right)}^{2}}{{E}_{i}}$
Rejection region:
${\chi }_{\alpha ,4}^{2}$ is the value such that $\alpha$ of the area under the ${\chi }^{2}$ curve with d (=4) degrees of freedom lies to the right of ${\chi }_{\alpha ,4}^{2}$ . Here $\alpha$ the significance level is chosen to be 0.01.
Repeat the above process for remaining error vectors ei, i=2,…., n for the same N key-IV bit sequences and apply chi-square goodness of fit test independently, resulting in n number of p-values. Preferably all the obtained n number of p-values should exceed 0.01.

Pseudo code:
for i=1 to n
do
for  j=1to N
do
Randomly choose key-IV bit sequenceas Kj = (k1,k2,…                     ..,kt,kt+1,…..,kn)
Generate L bits of keystream Lj from the cipher using Kj
Construct ${K}_{j}^{\ast }$ =Kj$\oplus$ ei and generate L bits of keystream ${L}_{j}^{\ast }$ from the cipher using ${K}_{j}^{\ast }$
wj=Hamming weight $\left({L}_{j}\oplus {L}_{j}^{*}\right)$
Categorise the value wj
endfor
Apply chi-square goodness of fit test to the values wj
Return p-value
endfor

2.2   SAC-c Diffusion Test

Unlike SAC-r, here we find the Hamming weights of each N bit length columnof the weight matrix ${M}_{1}^{w}$ . Then we get L integer values corresponding to L columns of the weight matrix which takes values in between 0 and N (both inclusive). These values should follow binomial distribution B(L, ½). The Chi Square goodness of fit test is applied to estimate the distribution of these L values. Fixing the same N random key-IV bit sequencesand repeating the above process using all error vectors ei, i=2,….,n and applying Chi-square goodness of fit test for each, will result in n number of p-values. Preferably all n number of p-values should take value greater than or equal to 0.01.

Pseudo Code:
for i=1 to n
do
wl=0 (l=1 to L)
for j=1to N
do
Randomly choose key-IV bit sequence as Kj=(k1,k2,…..,kt, kt+1,…..,kn),
Generate L bits of keystream Lj from the cipher using Kj
Construct ${K}_{j}^{\ast }$ =Kj$\oplus$ ei and generate L bits of keystream ${L}_{j}^{\ast }$ using ${K}_{j}^{\ast }$ as key-IV bit sequence
Rj=(Lj$\oplus$ ${L}_{j}^{\ast }$ ); jth row of the weight matrix M={Mjl} of order $N×L$
for l=1 to L
wl=Mjl +wl
endfor
end for
Categorise the L values wl
Apply chi-square goodness of fit test to the values wl
Return p-value
end for

If the cipher fails the above tests corresponding to any error vector ei (i=1,2,…,n), then one can retrace the paths affected by those state register(s) occupied by the ith bit of the key-IV bit sequenceand modify the cipher accordingly.

The selection of parameters for the above tests are described here. For both SAC-r and SAC-c diffusion tests, n weight matrices each of size 216 x 220 and 220 x 216, respectively are generated corresponding to n error vectors ei (i=1, 2,…, n). For SAC-r diffusion test the Hamming weights of the 216 rows for each of the n matrices are individually grouped into 5 categories (as shown in Table 1) and are evaluated using chi-square goodness of fit test for binomial distribution.

Table 1.Weight matrics

Similarly for SAC-c diffusion test the Hamming weights of the 216 columns for each of the n matrices corresponding to n error vectors ei are individually grouped into 5 categories and are evaluated using chi-square goodness of fit test. The value of N should be larger for this test to be applicable. High or low weight values indicate poor diffusion properties of the cipher. Since larger sample size gives accurate result, one may choose N=216, L=220 for SAC-r test and N=220, L=216 for SAC-c test.
We applied the two tests on the eSTREAM candidate ciphers Trivium-806 and Grain-807. The bolded ith entry (read row wise) in Table 2 and Table 3 implies that the position(s) (or register(s)) occupied by the ith bit of the key-IV bit sequence in the internal state of the cipher need to be rechecked, as the bit occupying that position in the register does not affect the cipher to cause a change in the keystream. This indicates that these ciphers have poor diffusion for some key/IV bits.

Table 2.Trivium-80 stream cipher: SAC-r/SAC-c results

Table 3.Grain-80 stream cipher: SAC-r/SAC-c results

In this paper two new statistical testing methods (SAC-r and SAC-c) are introduced to measure the level of diffusion in the keystream of a stream cipher. A stream cipher should pass both SAC-r and SAC-c diffusion tests for all key/IV bits to confirm proper diffusion. We have noted that stream ciphers Grain-80 and Trivium-80 are not passing these tests for few Key/IV bits. Impact of above observations on the security of the ciphers need to be analysed.

1. Englund, Håkan; Johansson, Thomas & Turan, Meltem Sönmez. A framework for chosen IV statistical analysis of stream ciphers. In Progress in Cryptology– INDOCRYPT 2007 LNCS, 2007, 4859, pp. 268-81.

2. Shannon, C. Communication theory of secrecy systems. Bell Syst. Technical J., 1949, 28(4), 656-715.

3. Castro, Julio Cesar Hernandez; Sierra, Jose Maria; Seznec, Andre; Izquierdo, Antonio & Ribagorda, Arturo. The strict avalanche criterion randomness test. Mathematics and Computers in Simulation, 2005, 68(2), 1-7.

4. Turan, M.S.; Doganaksoy, A. & Calik, C. Statistical analysis of synchronous stream ciphers. SASC 2006: Stream Ciphers Revisited 2006.

5. Adams, C & Tavares, S. The structured design of cryptographically good s-boxes. Journal Cryptology, 1990, 3, 27-41.

6.  De Cannière, Christophe & Preneel, Bart. Trivium. The eSTREAM Project - eSTREAM Phase 3. http://www.ecrypt.eu.org/stream/triviump3.html [Accessed on April 4, 2011]

7.  Hell, M; Johansson, T & Willi, Meier. Grain: A stream cipher for constrained environments. Int. J. Wireless Mobile Comput., 2007, 2(1), 86-93.

8.  Webster, A & Tavares, S. On the design of S-Boxes. In Proceedings of the Advances in Cryptology-Crypto85: LNCS, Springer-Verlag, 1986, 219, pp. 523-34.

9.  Liu, Bozhong; Gong, Zheng; Qiu, Weidong & Zheng, Dong. On the security of 4-bit involutive S-boxes for lightweight designs, LNCS, 2011, 6672, pp. 247-56.

10. ECRYPT: The home page for eSTREAM. The ECRYPT Stream Cipher Project. http://www.ecrypt.eu.org/stream [Accessed on April 4, 2011].

 Mr Chungath Srinivasan received his MSc (Mathematics) from University of Calicut. Currently, he is working as a faculty in the Amrita Vishwa Vidyapeetham University, Coimbatore. Ms Lakshmy K.V. received her MSc (Mathematics) from the University of Calicut. She is a full time CSIR funded Research Scholar in Amrita Vishwa Vidyapeetham University, Coimbatore. Dr M. Sethumadhavan obtained his PhD (Number Theory) from Calicut Regional Engineering College. Currently, he is working as a Professor in the Department of Mathematics and Computer Science, Amrita Vishwa Vidyapeetham University, Coimbatore.