Parallelising Sumcheck

MLE Sumcheck

x_2

x_3

x_1

f(0,0,0) = f_1

f(0,0,1) = f_2

f(0,1,1) = f_4

f(0,1,0) = f_3

f(1,1,0) = f_7

f(1,1,1) = f_8

f(1,0,0) = f_5

f(1,0,1) = f_6

\begin{bmatrix} & \\ & \\ & \\ & \\ & \\ & \\ & \\ & \\ & \\ & \\ & \\ & \\ \end{bmatrix}

\implies f(X_3, X_2, X_1)

MLE Sumcheck

f(0,0,0) = f_1

x_2

x_3

f(0,0,1) = f_2

f(0,1,1) = f_4

f(0,1,0) = f_3

f(1,1,0) = f_7

f(1,1,1) = f_8

f(1,0,0) = f_5

f(1,0,1) = f_6

x_1

MLE representation makes sumcheck prover's computation parallelisable
Computing round polynomials is easy

r_1(X) := (1-X)\textcolor{orange}{\sum_{x_2, x_1} f(0, x_2, x_1)} + X\textcolor{93c47d}{\sum_{x_2, x_1} f(1, x_2, x_1)}

Need to fold along a dimension each round

\alpha_3

(1-\alpha_3)

\alpha_3

MLE Sumcheck

f_1

x_2

x_3

f_2

f_4

f_3

f_7

f_8

f_5

f_6

x_1

MLE representation makes sumcheck prover's computation parallelisable
Computing round polynomials is easy

r_1(X) := (1-X)\textcolor{orange}{\sum_{x_2, x_1} f(0, x_2, x_1)} + X\textcolor{93c47d}{\sum_{x_2, x_1} f(1, x_2, x_1)}

Need to fold along a dimension in each round

\alpha_3

(1-\alpha_3)

\alpha_3

MLE Sumcheck

x_2

x_1

MLE representation makes sumcheck prover's computation parallelisable
Computing round polynomials is easy

r_1(X) := (1-X)\textcolor{orange}{\sum_{x_2, x_1} f(0, x_2, x_1)} + X\textcolor{93c47d}{\sum_{x_2, x_1} f(1, x_2, x_1)}

Need to fold along a dimension in each round

f_1

f_2

f_4

f_3

f_7

f_8

f_5

f_6

+ \alpha_3

(1-\alpha_3)

+ \alpha_3

+\alpha_3

r_2(X) := (1-X)\textcolor{orange}{\sum_{x_1} f(\alpha_3, 0, x_1)} + X\textcolor{93c47d}{\sum_{x_1} f(\alpha_3, 1, x_1)}

= (1-X)\textcolor{orange}{\sum_{x_1} f'_{\alpha_3}(0, x_1)} + X\textcolor{93c47d}{\sum_{x_1} f'_{\alpha_3}(1, x_1)}

MLE Sumcheck

x_2

x_1

MLE representation makes sumcheck prover's computation parallelisable
Computing round polynomials is easy

r_1(X) := (1-X)\textcolor{orange}{\sum_{x_2, x_1} f(0, x_2, x_1)} + X\textcolor{93c47d}{\sum_{x_2, x_1} f(1, x_2, x_1)}

Need to fold along a dimension in each round

f_1

f_2

f_4

f_3

f_7

f_8

f_5

f_6

+ \alpha_3

(1-\alpha_3)

+ \alpha_3

+\alpha_3

r_2(X) := (1-X)\textcolor{orange}{\sum_{x_1} f(\alpha_3, 0, x_1)} + X\textcolor{93c47d}{\sum_{x_1} f(\alpha_3, 1, x_1)}

= (1-X)\textcolor{orange}{\sum_{x_1} f'_{\alpha_3}(0, x_1)} + X\textcolor{93c47d}{\sum_{x_1} f'_{\alpha_3}(1, x_1)}

(1-\alpha_2)

\alpha_2

(1-\alpha_2)

\alpha_2

MLE Sumcheck

x_1

MLE representation makes sumcheck prover's computation parallelisable
Computing round polynomials is easy

r_1(X) := (1-X)\textcolor{orange}{\sum_{x_2, x_1} f(0, x_2, x_1)} + X\textcolor{93c47d}{\sum_{x_2, x_1} f(1, x_2, x_1)}

Need to fold along a dimension in each round

f_1

f_2

f_4

f_3

f_7

f_8

f_5

f_6

+ \alpha_3

(1-\alpha_3)

+ \alpha_3

+\alpha_3

r_2(X) := (1-X)\textcolor{orange}{\sum_{x_1} f(\alpha_3, 0, x_1)} + X\textcolor{93c47d}{\sum_{x_1} f(\alpha_3, 1, x_1)}

= (1-X)\textcolor{orange}{\sum_{x_1} f'_{\alpha_3}(0, x_1)} + X\textcolor{93c47d}{\sum_{x_1} f'_{\alpha_3}(1, x_1)}

(1-\alpha_2)

\alpha_2

(1-\alpha_2)

\alpha_2

r_3(X) := (1-X)\textcolor{orange}{\sum_{x_1} f''_{\alpha_3, \alpha_2}(0)} + X\textcolor{93c47d}{\sum_{x_1} f'_{\alpha_3, \alpha_2}(1)}

Upadting original polynomial with challenges is tricky

MLE Sumcheck

f(\mathbf{X})

f_{1,o}

f_{1,e}

\sum

\textsf{hash}

\alpha_4

f_{2,o}

f_{2,e}

\sum

\textsf{hash}

\alpha_3

f_{3,o}

f_{3,e}

\sum

\textsf{hash}

\alpha_2

f_{4,o}

f_{4,e}

\sum

\textsf{hash}

\alpha_1

Round computation can be parallelised

Round \(i+1\) depends on round \(i\) challenge

Need to process all terms to compute challenge

Sumcheck as a Backend

For R1CS arithmetisation [1]:

\begin{bmatrix} \\ & \bar{A} & \\ \\ \end{bmatrix} \hspace{-6pt} \begin{bmatrix} \\ \vec{z}\\ \\ \end{bmatrix} \circ \begin{bmatrix} \\ & \bar{B} & \\ \\ \end{bmatrix} \hspace{-6pt} \begin{bmatrix} \\ \vec{z}\\ \\ \end{bmatrix} = \begin{bmatrix} \\ & \bar{C} & \\ \\ \end{bmatrix} \hspace{-6pt} \begin{bmatrix} \\ \vec{z}\\ \\ \end{bmatrix}

G(x) := \left(\Bigg(\sum_{y \in \mathbb{B}_\mu}A(x,y)z(y)\Bigg)\Bigg(\sum_{y \in \mathbb{B}_\mu}B(x,y)z(y)\Bigg) - \sum_{y \in \mathbb{B}_\mu}C(x,y)z(y)\right) \textcolor{orange}{\textsf{eq}(x, \tau)}

For Plonkish arithmetisation [2]:

\sum_{x} a(x)b(x)\textsf{eq}(x) - c(x)\textsf{eq}(x) = 0

\begin{aligned} & \textcolor{grey}{S_{\textsf{add}}(\vec{x})} \Big( \textcolor{red}{L(\vec{x})} + \textcolor{94c47d}{R(\vec{x})} \Big) + \textcolor{grey}{S_{\textsf{mul}}(\vec{x})} \Big(\textcolor{red}{L(\vec{x})} \cdot \textcolor{94c47d}{R(\vec{x})}\Big) + & \textcolor{grey}{S_{\textsf{gate}}(\vec{x})} G\Big[ \textcolor{red}{L(\vec{x})}, \textcolor{94c47d}{R(\vec{x})} \Big] - \textcolor{skyblue}{O(\vec{x})} = 0 \end{aligned}

Sumcheck can work as a proof system and arithmetisation agnostic backend

\textsf{combine}_{\text{R1CS}}(a,b,c,d) := abd - cd

\textsf{combine}_{\text{plonkish}}(a,b,c, s_1, s_2, s_3) := s_1(a + b) + s_2 ab + s_3 G(a, b) - c

🔗

MLE Sumcheck Round

f_{1,o}

f_{1,e}

\sum

\textsf{hash}

\alpha_i

s = \sum_{\vec{x}}f(\vec{x})

s = \sum_{\vec{x}}p(\vec{x}) q(\vec{x})

L(c)

\hspace{1cm}

f_1

L(c)

\hspace{1cm}

f_2

L(c)

\hspace{1cm}

f_3

L(c)

\hspace{1cm}

f_4

L(c)

\hspace{1cm}

f_5

L(c)

\hspace{1cm}

f_6

L(c)

\hspace{1cm}

f_7

L(c)

\hspace{1cm}

f_8

L(c)

\hspace{1cm}

g_1

L(c)

\hspace{1cm}

g_2

L(c)

\hspace{1cm}

g_3

L(c)

\hspace{1cm}

g_4

L(c)

\hspace{1cm}

g_5

L(c)

\hspace{1cm}

g_6

L(c)

\hspace{1cm}

g_7

L(c)

\hspace{1cm}

g_8

\textsf{combine}(f,g)

r_i(c)

\begin{bmatrix} & & \\ & & \\ & & \\ & & \\ \end{bmatrix}_{c \in [d]}

\textsf{hash}

\alpha_i

Challenges

Step 1: Linearisation is parallelisable
Step 2: \(\textsf{combine}\) is parallelisable
Step 3: \(\textsf{hash}\) needs all \(\{r_i(c)\}_{c \in \{0,1,\dots, d-1\}}\)
Step 4: Folding of state is parallelisable

Memory constraints
- RTX 4090 can fit one instance of \(2^{19}\) R1CS
- For larger sizes, break the computation down in chunks of size \(2^{19}\)
- Reads: All witness data will be read twice to L2 cache
- Writes: Updates in first few round will need to written back to memory
Challenge computation depends on the entire witness state
Can we perform "look-ahead" computation instead of waiting for challenges?

The Good

The Bad

Witness-Challenge Separation

[1] Justin Thaler, The Sum-Check Protocol over Fields of Small Characteristic, 2023

\implies

f(\alpha_1, \alpha_2, \dots, \alpha_i, \mathbf{X}) = \textcolor{d089ff}{(1-\alpha_1)}f(\textcolor{pink}{0}, \alpha_2, \dots, \alpha_i, \mathbf{X})

\textcolor{d089ff}{(1-\alpha_1)(1-\alpha_2)}f(\textcolor{pink}{0, 0}, \alpha_3, \dots, \alpha_i, \mathbf{X})

+ \textcolor{d089ff}{(\alpha_1)}f(\textcolor{pink}{1}, \alpha_2, \dots, \alpha_i, \mathbf{X})

+ \textcolor{d089ff}{(1-\alpha_1)(\alpha_2)}f(\textcolor{pink}{0, 1}, \alpha_3, \dots, \alpha_i, \mathbf{X})

+\textcolor{d089ff}{(\alpha_1)(1-\alpha_2)}f(\textcolor{pink}{1, 0}, \alpha_3, \dots, \alpha_i, \mathbf{X})

+ \textcolor{d089ff}{(\alpha_1)(\alpha_2)}f(\textcolor{pink}{1, 1}, \alpha_3, \dots, \alpha_i, \mathbf{X})

\therefore

Witness-Challenge Separation

\underbrace{\hspace{5.8cm}}{}

Challenge term

\underbrace{\hspace{6cm}}{}

Witness term

Witness-Challenge Separation

\begin{bmatrix} & & & & & & & & & & & & & & & & \\ \end{bmatrix}_{1 \times 2^i}

L_1(\vec{\alpha}_i)

L_2(\vec{\alpha}_i)

L_3(\vec{\alpha}_i)

L_{2^i}(\vec{\alpha}_i)

\dots

\begin{bmatrix} & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ \end{bmatrix}_{2^{i} \times 1}

L_1(\vec{\alpha}_i)

L_2(\vec{\alpha}_i)

L_3(\vec{\alpha}_i)

\vdots

L_{2^i}(\vec{\alpha}_i)

\boxdot

\begin{bmatrix} & & & & & & \\ & & & & & & \\ & & & & & & \\ & & & & & & \\ & & & & & & \\ & & & & & & \\ & & & & & & \\ & & & & & & \\ & & & & & & \\ \end{bmatrix}_{2^{i} \times \frac{n}{2^i}}

a(\underline{0}, \mathbf{X})

a(\underline{1}, \mathbf{X})

a(\underline{2}, \mathbf{X})

a(\underline{\small 2^i-1}, \mathbf{X})

\vdots

b(\underline{0}, \mathbf{X})

b(\underline{1}, \mathbf{X})

b(\underline{\small 2^{i}-1}, \mathbf{X})

\dots

\begin{bmatrix} & & & & & & & & & & & & & & & & \\ & & & & & & & & & & & & & & & & \\ & & & & & & & & & & & & & & & & \\ & & & & & & & & & & & & & & & & \\ \end{bmatrix}_{\frac{n}{2^i} \times 2^i}

Matrix operations faster on GPU
We can generalise this to any sumcheck instance

a_1

a_2

a_3

a_4

a_5

a_6

a_7

a_8

a_9

a_{10}

a_{11}

a_{12}

a_{13}

a_{14}

a_{15}

a_{16}

{\scriptsize \sum}_{1}^{8}

{\scriptsize \sum}_{9}^{16}

{\scriptsize \sum}_{1}^{4}

{\scriptsize \sum}_{9}^{12}

{\scriptsize \sum}_{5}^{8}

{\scriptsize \sum}_{13}^{16}

{\scriptsize \sum}_{1}^{2}

{\scriptsize \sum}_{9}^{10}

{\scriptsize \sum}_{3}^{4}

{\scriptsize \sum}_{11}^{12}

{\scriptsize \sum}_{5}^{6}

{\scriptsize \sum}_{13}^{14}

{\scriptsize \sum}_{7}^{8}

{\scriptsize \sum}_{15}^{16}

\ a_1 \

\hspace{1.4mm} a_9 \hspace{1.5mm}

\ a_2 \

\hspace{0.88mm} a_{10} \hspace{0.88mm}

\ a_3 \

\hspace{0.88mm} a_{11} \hspace{0.88mm}

\ a_4 \

\hspace{0.88mm} a_{12} \hspace{0.88mm}

\ a_5 \

\hspace{0.88mm} a_{13} \hspace{0.88mm}

\ a_6 \

\hspace{0.88mm} a_{14} \hspace{0.88mm}

\ a_7 \

\hspace{0.88mm} a_{15} \hspace{0.88mm}

\ a_8 \

\hspace{0.88mm} a_{16} \hspace{0.88mm}

\odot

\textsf{hash}

\bar{\alpha}_1

\alpha_1

\bar{\alpha}_1\bar{\alpha}_2

\bar{\alpha}_1 \alpha_2

\alpha_1 \bar{\alpha}_2

\alpha_1 \alpha_2

\bar{\alpha}_1\bar{\alpha}_2

\bar{\alpha}_1 \alpha_2

\alpha_1 \bar{\alpha}_2

\alpha_1 \alpha_2

\bar{\alpha}_1\bar{\alpha}_2\bar{\alpha}_3

\bar{\alpha}_1 \alpha_2 \bar{\alpha}_3

\alpha_1 \bar{\alpha}_2\bar{\alpha}_3

\alpha_1 \alpha_2 \alpha_3

\bar{\alpha}_1\bar{\alpha}_2 \alpha_3

\bar{\alpha}_1 \alpha_2 \alpha_3

\alpha_1 \bar{\alpha}_2 \alpha_3

\alpha_1 \alpha_2 \bar{\alpha}_3

\bar{\alpha}_1\bar{\alpha}_2\bar{\alpha}_3

\bar{\alpha}_1 \alpha_2 \bar{\alpha}_3

\alpha_1 \bar{\alpha}_2\bar{\alpha}_3

\alpha_1 \alpha_2 \alpha_3

\bar{\alpha}_1\bar{\alpha}_2 \alpha_3

\bar{\alpha}_1 \alpha_2 \alpha_3

\alpha_1 \bar{\alpha}_2 \alpha_3

\alpha_1 \alpha_2 \bar{\alpha}_3

\odot

\textsf{hash}

\odot

\textsf{hash}

a_1

a_2

a_3

a_4

a_5

a_6

a_7

a_8

a_9

a_{10}

a_{11}

a_{12}

a_{13}

a_{14}

a_{15}

a_{16}

{\scriptsize \sum}_{1}^{4}

{\scriptsize \sum}_{9}^{12}

{\scriptsize \sum}_{5}^{8}

{\scriptsize \sum}_{13}^{16}

{\scriptsize \sum}_{1}^{2}

{\scriptsize \sum}_{9}^{10}

{\scriptsize \sum}_{3}^{4}

{\scriptsize \sum}_{11}^{12}

{\scriptsize \sum}_{5}^{6}

{\scriptsize \sum}_{13}^{14}

{\scriptsize \sum}_{7}^{8}

{\scriptsize \sum}_{15}^{16}

\bar{\alpha}_1

\alpha_1

\bar{\alpha}_1\bar{\alpha}_2

\bar{\alpha}_1 \alpha_2

\alpha_1 \bar{\alpha}_2

\alpha_1 \alpha_2

\bar{\alpha}_1\bar{\alpha}_2

\bar{\alpha}_1 \alpha_2

\alpha_1 \bar{\alpha}_2

\alpha_1 \alpha_2

\bar{\alpha}_1\bar{\alpha}_2\bar{\alpha}_3

\bar{\alpha}_1 \alpha_2 \bar{\alpha}_3

\alpha_1 \bar{\alpha}_2\bar{\alpha}_3

\alpha_1 \alpha_2 \alpha_3

\bar{\alpha}_1\bar{\alpha}_2 \alpha_3

\bar{\alpha}_1 \alpha_2 \alpha_3

\alpha_1 \bar{\alpha}_2 \alpha_3

\alpha_1 \alpha_2 \bar{\alpha}_3

\bar{\alpha}_1\bar{\alpha}_2\bar{\alpha}_3

\bar{\alpha}_1 \alpha_2 \bar{\alpha}_3

\alpha_1 \bar{\alpha}_2\bar{\alpha}_3

\alpha_1 \alpha_2 \alpha_3

\bar{\alpha}_1\bar{\alpha}_2 \alpha_3

\bar{\alpha}_1 \alpha_2 \alpha_3

\alpha_1 \bar{\alpha}_2 \alpha_3

\alpha_1 \alpha_2 \bar{\alpha}_3

\odot

\textsf{hash}

\odot

\textsf{hash}

b_1

b_2

b_3

b_4

b_5

b_6

b_7

b_8

b_9

b_{10}

b_{11}

b_{12}

b_{13}

b_{14}

b_{15}

b_{16}

{\scriptsize \dots}

{\scriptsize \dots}

Challenges

No need to update witness state after getting a challenge
Can reuse some computation in look-ahead computation
Matrix operations in look-ahead computation can be faster on GPUs

The Good

Need \(2^{i(d-1)}\) cache memory to store look-ahead witnesses in round \(i\)
Need \(2^{i}\) cache memory to store challenges
Memory constraints \(\implies\) use witness-separation only upto certain round
Cannot perform look-ahead computation in a single read
Arguably more complex algorithm and hence implementation

The Bad

p_0

p_1

q_0

q_1

p_{00}

p_{01}

p_{10}

p_{11}

q_{00}

q_{01}

q_{10}

q_{11}

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

p_0

p_1

q_0

q_1

p_{00}

p_{01}

p_{10}

p_{11}

q_{00}

q_{01}

q_{10}

q_{11}

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

p_0

p_1

q_0

q_1

p_{00}

p_{01}

p_{10}

p_{11}

q_{00}

q_{01}

q_{10}

q_{11}

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

\times

p_0

p_1

q_0

q_1

p_{00}

p_{01}

p_{10}

p_{11}

q_{00}

q_{01}

q_{10}

q_{11}

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

\times

p_0

p_1

q_0

q_1

p_{00}

p_{01}

p_{10}

p_{11}

q_{00}

q_{01}

q_{10}

q_{11}

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

\times

p_0

p_1

q_0

q_1

p_{00}

p_{01}

p_{10}

p_{11}

q_{00}

q_{01}

q_{10}

q_{11}

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

\times

f_1 = \begin{bmatrix} 2 & 3 & 4 & 1 \\ 0 & 8 & 3 & 2 \\ \end{bmatrix}

g_1 = \begin{bmatrix} 1 & 4 & 2 & 5 \\ 4 & 3 & 1 & 6 \\ \end{bmatrix}

\begin{bmatrix} 2 & 12 & 8 & 5 \\ 8 & 9 & 4 & 6 \\ 0 & 32 & 6 & 10 \\ 0 & 24 & 3 & 12 \\ \end{bmatrix}

\circledast

\oplus

\begin{bmatrix} 27 \\ 27 \\ 48 \\ 39 \\ \end{bmatrix}

r_1(c)

\begin{bmatrix} \gray{(1-c)} \\ \gray{c} \end{bmatrix} = \Gamma_1(c)

\begin{bmatrix*}[l] \gray{(1-c)^2} \\ \gray{(1-c)c} \\ \gray{c(1-c)} \\ \gray{c^2} \end{bmatrix*}

\circledast

\odot

Round 1

f_2 = \begin{bmatrix} 2 & 3 \\ 4 & 1 \\ 0 & 8 \\ 3 & 2 \\ \end{bmatrix}

g_2 = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 4 & 3 \\ 1 & 6 \\ \end{bmatrix}

\begin{bmatrix} 2 & 12 \\ 4 & 15 \\ 8 & 9 \\ 2 & 18 \\ 4 & 4 \\ 8 & 5 \\ 16 & 3 \\ 4 & 6 \\ 0 & 32 \\ 0 & 40 \\ 0 & 24 \\ 0 & 48 \\ 3 & 8 \\ 6 & 10 \\ 12 & 6 \\ 3 & 12 \\ \end{bmatrix}

\circledast

\oplus

\begin{bmatrix} 14 \\ 19 \\ 17 \\ 20 \\ 8 \\ 13 \\ 19 \\ 10 \\ 32 \\ 40 \\ 24 \\ 48 \\ 11 \\ 16 \\ 18 \\ 15 \\ \end{bmatrix}

r_2(c)

\begin{bmatrix*}[l] \gray{(1-\alpha_1)(1-c)} \\ \gray{(1-\alpha_1)c} \\ \gray{\alpha_1(1-c)} \\ \gray{\alpha_1c} \end{bmatrix*} = \Gamma_2(c)

\begin{bmatrix*}[l] \gray{(1-\alpha_1)^2(1-c)^2} \\ \gray{(1-\alpha_1)^2(1-c)c} \\ \gray{(1-\alpha_1)\alpha_1(1-c)^2} \\ \gray{(1-\alpha_1)\alpha_1(1-c)c} \\ \gray{(1-\alpha_1)^2(1-c)c} \\ \gray{(1-\alpha_1)^2c^2} \\ \gray{(1-\alpha_1)\alpha_1(1-c)c} \\ \gray{(1-\alpha_1)\alpha_1c^2} \\ \gray{(1-\alpha_1)\alpha_1(1-c)^2} \\ \gray{(1-\alpha_1)\alpha_1(1-c)c} \\ \gray{\alpha_1^2(1-c)^2} \\ \gray{\alpha_1^2(1-c)c} \\ \gray{(1-\alpha_1)\alpha_1(1-c)c} \\ \gray{(1-\alpha_1)\alpha_1c^2} \\ \gray{\alpha_1^2(1-c)c} \\ \gray{\alpha_1^2c^2} \\ \end{bmatrix*}

\circledast

\odot

Round 2

f_2 = \begin{bmatrix} 2 & 3 \\ 4 & 1 \\ 0 & 8 \\ 3 & 2 \\ \end{bmatrix}

g_2 = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 4 & 3 \\ 1 & 6 \\ \end{bmatrix}

\begin{bmatrix} 2 & 12 \\ 4 & 15 \\ 8 & 9 \\ 2 & 18 \\ 4 & 4 \\ 8 & 5 \\ 16 & 3 \\ 4 & 6 \\ 0 & 32 \\ 0 & 40 \\ 0 & 24 \\ 0 & 48 \\ 3 & 8 \\ 6 & 10 \\ 12 & 6 \\ 3 & 12 \\ \end{bmatrix}

\circledast

\oplus

\begin{bmatrix} 14 \\ 19 \\ 17 \\ 20 \\ 8 \\ 13 \\ 19 \\ 10 \\ 32 \\ 40 \\ 24 \\ 48 \\ 11 \\ 16 \\ 18 \\ 15 \\ \end{bmatrix}

r_2(c)

\begin{bmatrix*}[l] \gray{(1-\alpha_1)(1-c)} \\ \gray{(1-\alpha_1)c} \\ \gray{\alpha_1(1-c)} \\ \gray{\alpha_1c} \end{bmatrix*} = \Gamma_2(c)

\circledast

\odot

Round 2

Memory

m \times 2^n

(2^i)^m \times 2^{n-i}

(2^i)^m

2^i

Multiplications

(2^i)^m \times 2^{n-i}

(2^i)^m

2^i

c = pq

d = cr

p_{0}

p_{1}

q_{0}

q_{1}

r_{0}

r_{1}

c_0 = p_0q_1

c_1 = p_1q_0

c_0 = p_0q_1

p_{00}

p_{01}

p_{10}

p_{11}

q_{00}

q_{01}

q_{10}

q_{11}

r_{00}

r_{01}

r_{10}

r_{11}

p_0

p_1

p_3

p_4

p_{16}

q_0

q_2

q_3

q_4