Twist and Shout

Evolution of SNARKs

  • Fast prover
  • Smallest proof
  • Very fast verifier
  • Circuit trusted-setup
  • Transparent
  • Acceptable (log) proof size
  • Great for range checks
  • Slow (linear) verifier
  • Not-so-fast prover
  • Fast prover
  • Very small proof
  • Fast verifier
  • Universal trusted setup
  • FFTs
  • No FFTs
  • High-degree custom gates
  • Sumcheck-based
  • Permutation check
  • Fully multi-linear
  • Sparse permutation check
  • Sparse commitments
  • Commitment key size?

2016

Groth'16

Bootle / Bulletproofs

2018

2019

Plonk

2022

HyperPlonk

2025

Twist & Shout

RISCV Example

.section .data
array:  .word 0, 0, 0, 0  # Initially, memory is all zeros

.section .text
.global _start

_start:
    # Step 1: Load base address of array
    la t0, array  

    # Step 2: Store Initial Values Explicitly
    li t1, 10     # Load immediate 10 into t1
    li t2, 20     # Load immediate 20 into t2
    sw t1, 0(t0)  # Store t1 (10) into array[0]
    sw t2, 4(t0)  # Store t2 (20) into array[1]

    # Step 3: Explicitly Load Values Back into Registers
    lw t1, 0(t0)  # Load array[0] (10) into t1
    lw t2, 4(t0)  # Load array[1] (20) into t2

    # Step 4: Perform Arithmetic Operations
    add t3, t1, t2  # t3 = t1 + t2 (10 + 20 = 30)
    mul t4, t1, t2  # t4 = t1 * t2 (10 * 20 = 200)

    # Step 5: Store Computation Results in Memory
    sw t3, 8(t0)   # Store sum at array[2]
    sw t4, 12(t0)  # Store multiplication result at array[3]

    # Step 6: Exit Program
    li a7, 10      # syscall for exit
    ecall
a = [0, \ 0, \ 0, \ 0, \ 0]
a = [\textcolor{orange}{10}, \ \textcolor{orange}{20}, \ 0, \ 0, \ 0]
a = [10, \ 20, \ \textcolor{orange}{30}, \ \textcolor{orange}{10}, \ \textcolor{orange}{200}]
t_1 = 10, \ t_2 = 20
\begin{aligned} t_3 &= t_1 + t_2 \\ t_4 &= t_1 \times t_2 \end{aligned}

Memory-checking Protocols

Write 10

Read 10

\textcolor{red}{\textsf{rv}(j)} = \textcolor{lightgreen}{\textsf{wv}(j')}
\text{s.t. } j' < j

Prove: these two are equal

Memory-checking Protocols

address value
0x2000 0
0x2004 0
0x2008 0
0x200C 0

\(\text{Memory}\)

Memory-checking Protocols

address value
0x2000 0
0x2004 0
0x2008 0
0x200C 0

\(\text{Memory}\)

user-input

Memory-checking Protocols

\(\text{Memory}\)

user-input

address value
0x2000 0
0x2004 0
0x2008 0
0x200C 0

Memory-checking Protocols

\(\text{Memory}\)

address value
0x2000 0
0x2004 0
0x2008 0
0x200C 0

store 10

Memory-checking Protocols

\(\text{Memory}\)

address value
0x2000 10
0x2004 0
0x2008 0
0x200C 0

store 20

Memory-checking Protocols

\(\text{Memory}\)

address value
0x2000 10
0x2004 20
0x2008 0
0x200C 0

Memory-checking Protocols

\(\text{Memory}\)

address value
0x2000 10
0x2004 20
0x2008 0
0x200C 0

load 10

Memory-checking Protocols

\(\text{Memory}\)

address value
0x2000 10
0x2004 20
0x2008 0
0x200C 0

load 20

Memory-checking Protocols

\(\text{Memory}\)

address value
0x2000 10
0x2004 20
0x2008 0
0x200C 0

Memory-checking Protocols

\(\text{Memory}\)

address value
0x2000 10
0x2004 20
0x2008 0
0x200C 0

Memory-checking Protocols

\(\text{Memory}\)

address value
0x2000 10
0x2004 20
0x2008 0
0x200C 0

store 30

Memory-checking Protocols

\(\text{Memory}\)

address value
0x2000 10
0x2004 20
0x2008 30
0x200C 0

store 200

Memory-checking Protocols

\(\text{Memory}\)

address value
0x2000 10
0x2004 20
0x2008 30
0x200C 200

store 200

Memory-checking Protocols

\(\text{Memory}\)

address value
0x2000 10
0x2004 20
0x2008 30
0x200C 200

Prove the reads and writes to and from memory were performed correctly. 

🎯

Trace Reordering

writes 10

reads 10

4
5
6
7
3
2
1
8
9
10
11
12
13

Trace Reordering

writes 10

reads 10

writes 20

reads 20

4
5
6
7
3
2
1
8
9
10
11
12
13

\(\therefore\) Reordered memory operations are a permutation of the original memory operation!

Spice: Reordering-free approach

4
5
6
7
3
2
1
8
9
10
11
12
13
(0\text{x}2000, \ \textcolor{orange}{10}, \ 6)
(0\text{x}2004, \ \textcolor{orange}{20}, \ 7)
(0\text{x}2000, \ \textcolor{orange}{10}, \ 6)
(0\text{x}2004, \ \textcolor{orange}{20}, \ 7)
(0\text{x}2008, \ \textcolor{orange}{30}, \ 10)
(0\text{x200C}, \ \textcolor{orange}{200}, \ 11)
\text{Read streams}
\text{Write streams}
(0\text{x}2000, \ \textcolor{orange}{10}, \ 4)
(0\text{x}2004, \ \textcolor{orange}{20}, \ 5)
(0\text{x}2000, \ \textcolor{orange}{10}, \ 4)
(0\text{x}2004, \ \textcolor{orange}{20}, \ 5)
(0\text{x}2008, \ \textcolor{orange}{30}, \ 10)
(0\text{x200C}, \ \textcolor{orange}{200}, \ 11)

Spice: Reordering-free approach

(0\text{x}2000, \ \textcolor{orange}{10}, \ 6)
(0\text{x}2004, \ \textcolor{orange}{20}, \ 7)
(0\text{x}2000, \ \textcolor{orange}{10}, \ 6)
(0\text{x}2004, \ \textcolor{orange}{20}, \ 7)
(0\text{x}2008, \ \textcolor{orange}{30}, \ 10)
(0\text{x200C}, \ \textcolor{orange}{200}, \ 11)
\text{Read streams}
\text{Write streams}
(0\text{x}2000, \ \textcolor{orange}{10}, \ 4)
(0\text{x}2004, \ \textcolor{orange}{20}, \ 5)
(0\text{x}2000, \ \textcolor{orange}{10}, \ 4)
(0\text{x}2004, \ \textcolor{orange}{20}, \ 5)
(0\text{x}2008, \ \textcolor{orange}{30}, \ 10)
(0\text{x200C}, \ \textcolor{orange}{200}, \ 11)
  • If the local constraints are valid AND
  • If read stream is a permutation of the write stream
  • Then, memory is consistent!
  • Problem
    • Read and write sets grow very big \(\implies\) linear work for prover
  • Solution
    • Use multi-set hashing 
\textsf{MSH}(S_1 \cup S_2) \equiv \textsf{MSH}(S_1) + \textsf{MSH}(S_2)

Shout for ROM

address value
0x2000 10
0x2004 20
0x2008 30
0x200C 200
... ...
... ...

\(\text{ROM}\)

  • Use one-hot encoding for addresses
  • PCS and sum-check work well with sparsity
  • \(T:\) total number of cycles in the program
  • \(K:\) total memory size
  • ⚠️ Commit to \(\mathcal{O}(K \cdot T)\) witness?
    • EC vs hash-based PCS
  • \(\textsf{ra}(k, j):\) \(k\)-th bit of the address read at cycle \(j\)
\implies \textcolor{violet}{\textsf{rv}}(j) = \sum_{k \in [\ \text{rom table} \ ] } \textcolor{violet}{\textsf{ra}}(k, j) \cdot \textsf{val}(k)
  • RHS is multi-linear, just check at a random point
  • Still need to prove correctness of one-hot encoding
0\ldots000\textcolor{red}{1}
0\dots00\textcolor{red}{1}0
0\ldots0\textcolor{red}{1}00
\textcolor{red}{1}\ldots0000
encoding
...
0\ldots\textcolor{red}{1}000
\underbrace{\hspace{2.9cm}}
K
K

Shout for ROM

address value
0x2000 10
0x2004 20
0x2008 30
0x200C 200
... ...
... ...

\(\text{ROM}\)

  • Still need to prove correctness of one-hot encoding
    • \(\textcolor{violet}{\textsf{ra}}(k, j) \in \{0, 1\}\) for all \(k, j\)
    • \(\textcolor{violet}{\textsf{ra}}(k, j)\) is \(1\) for exactly one \(k\)
\underbrace{\hspace{2.9cm}}
K
K
\implies \textcolor{violet}{\textsf{ra}}(k, j) \cdot (\textcolor{violet}{\textsf{ra}}(k, j) - 1) = 0
\implies \sum_{k \in [\text{ rom table }]} \textcolor{violet}{\textsf{ra}}(k, j) = 1
0\ldots000\textcolor{red}{1}
0\dots00\textcolor{red}{1}0
0\ldots0\textcolor{red}{1}00
\textcolor{red}{1}\ldots0000
encoding
...
0\ldots\textcolor{red}{1}000
  • Prover commits to \((K-1)\) 0's and one 1 per read
  • ⚠️ For a very large table \(K \approx 2^{64}\), commitment key size is \(K \times T\) 😱
  • Hint: use a slightly more efficient encoding!

Shout for ROM

address value
0x2000 10
0x2004 20
0x2008 30
0x200C 200
... ...
... ...

\(\text{ROM}\)

\underbrace{\hspace{2.9cm}}
K
K
0\ldots000\textcolor{red}{1}
0\dots00\textcolor{red}{1}0
0\ldots0\textcolor{red}{1}00
\textcolor{red}{1}\ldots0000
one-hot
...
0\ldots\textcolor{red}{1}000
  • Hint: use a slightly more efficient encoding!
  • Suppose \(K=16,\) then addr at index 6:
    • one-hot: \(000000000\textcolor{red}{1}000000\)
    • 2d one-hot: \((0\textcolor{red}{1}00, \ 00\textcolor{red}{1}0)\)
\begin{bmatrix} 0 \\ \textcolor{red}{1} \\ 0 \\ 0 \end{bmatrix} \otimes \begin{bmatrix} 0 \\ 0 \\ \textcolor{red}{1} \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \cdot \begin{bmatrix} 0 \\ 0 \\ \textcolor{red}{1} \\ 0 \end{bmatrix} \\ \textcolor{red}{1} \cdot \begin{bmatrix} 0 \\ 0 \\ \textcolor{red}{1} \\ 0 \end{bmatrix} \\ 0 \cdot \begin{bmatrix} 0 \\ 0 \\ \textcolor{red}{1} \\ 0 \end{bmatrix} \\ 0 \cdot \begin{bmatrix} 0 \\ 0 \\ \textcolor{red}{1} \\ 0 \end{bmatrix} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \textcolor{red}{1} \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}
2d one-hot
0...00\textcolor{red}{1}
0...00\textcolor{red}{1}
0...0\textcolor{red}{1}0
0...00\textcolor{red}{1}
0...\textcolor{red}{1}00
0...00\textcolor{red}{1}
...
...
...
...
\textcolor{red}{1}...000
\textcolor{red}{1}...000
\underbrace{\hspace{2cm}}
K^{\frac{1}{2}}
\underbrace{\hspace{2cm}}
K^{\frac{1}{2}}
\equiv (2, 1)_4

Shout for ROM

address value
0x2000 10
0x2004 20
0x2008 30
0x200C 200
... ...
... ...

\(\text{ROM}\)

\underbrace{\hspace{2.9cm}}
K
K
0\ldots000\textcolor{red}{1}
0\dots00\textcolor{red}{1}0
0\ldots0\textcolor{red}{1}00
\textcolor{red}{1}\ldots0000
one-hot
...
0\ldots\textcolor{red}{1}000
  • Hint: use a slightly more efficient encoding!
  • Suppose \(K=16,\) then addr at index 6:
    • one-hot: \(000000000\textcolor{red}{1}000000\)
    • 2d one-hot: \((0\textcolor{red}{1}00, \ 00\textcolor{red}{1}0)\)
2d one-hot
0...00\textcolor{red}{1}
0...00\textcolor{red}{1}
0...0\textcolor{red}{1}0
0...00\textcolor{red}{1}
0...\textcolor{red}{1}00
0...00\textcolor{red}{1}
...
...
...
...
\textcolor{red}{1}...000
\textcolor{red}{1}...000
\underbrace{\hspace{2cm}}
K^{\frac{1}{2}}
\underbrace{\hspace{2cm}}
K^{\frac{1}{2}}
\equiv (2, 1)_4
  • KZG commitment key size: \(K^{\frac{1}{2}} \cdot T\)
    • For jolt, \(K=2^{64},\) so key size: \(2^{32}\)
  • Can generalise with parameter \(d\): \(K^{\frac{1}{d}}\)
  • Trade-off: prover needs to commit to \(d \cdot K^{\frac{1}{d}}\) ones

Shout for ROM

address value
0x2000 10
0x2004 20
0x2008 30
0x200C 200
... ...
... ...

\(\text{ROM}\)

K
  • Shout for general \(d\):
    • Read address vectors: \(\textcolor{violet}{\textsf{ra}_1}(k_1, j), \ \textcolor{violet}{\textsf{ra}_2}(k_2, j)\)
2d one-hot
0...00\textcolor{red}{1}
0...00\textcolor{red}{1}
0...0\textcolor{red}{1}0
0...00\textcolor{red}{1}
0...\textcolor{red}{1}00
0...00\textcolor{red}{1}
...
...
...
...
\textcolor{red}{1}...000
\textcolor{red}{1}...000
\underbrace{\hspace{2cm}}
K^{\frac{1}{2}}
\underbrace{\hspace{2cm}}
K^{\frac{1}{2}}
  • To prove correctness of 2d one-hot encoding
    • \(\textcolor{violet}{\textsf{ra}_i}(k_i, j) \in \{0, 1\}\) for all \(i, k_i, j\)
    • \(\textcolor{violet}{\textsf{ra}_i}(k_i, j)\) is \(1\) for exactly one \(k_i\)
\implies \textcolor{violet}{\textsf{ra}_i}(k_i, j) \cdot (\textcolor{violet}{\textsf{ra}_i}(k_i, j) - 1) = 0
\implies \sum_{k_i} \textcolor{violet}{\textsf{ra}_i}(k_i, j) = 1
\implies \textcolor{violet}{\textsf{rv}}(j) = \sum_{k_1} \sum_{k_2} \textcolor{violet}{\textsf{ra}_1}(k_1, j) \cdot \textcolor{violet}{\textsf{ra}_2}(k_2, j) \cdot \textsf{val}((k_2 \ \| \ k_1))

Twist for RAM

  • Prove correctness of reads AND writes
  • Need to commit to memory state: not sparse ⚠️
  • Lets look at the changes in memory state!

Twist for RAM

Twist for RAM

  • Idea: commit to \(\Delta\)-memory \(\textcolor{lightgreen}{\textsf{inc}}(k, j)\) but prove statements about \(\textcolor{lightgreen}{\textsf{val}}(k, j)\)
\begin{aligned} \textcolor{lightgreen}{\textsf{inc}}(k, j) &= \textcolor{lightgreen}{\textsf{val}}(k, j + 1) - \textcolor{lightgreen}{\textsf{val}}(k, j) \\[3pt] &= \textcolor{lightgreen}{\textsf{wa}}(k, j) \cdot \left( \textcolor{lightgreen}{\textsf{wv}}(k, j) - \textcolor{lightgreen}{\textsf{val}}(k, j) \right) \end{aligned}

Twist for RAM

  • Lets summarise the relations we want to prove:
\implies \textcolor{violet}{\textsf{rv}}(j) = \sum_{k \in [\ \text{memory} \ ] } \textcolor{violet}{\textsf{ra}}(k, j) \cdot \textcolor{lightgreen}{\textsf{val}}(k, j)
address value
0x2000 10
0x2004 20
0x2008 30
0x200C 200
... ...
... ...

\(\text{RAM}\)

0\ldots000\textcolor{red}{1}
0\dots00\textcolor{red}{1}0
0\ldots0\textcolor{red}{1}00
\textcolor{red}{1}\ldots0000
encoding
...
0\ldots\textcolor{red}{1}000
\underbrace{\hspace{2.9cm}}
K
K
  • But committing to \(\textcolor{lightgreen}{\textsf{val}}(k, j)\) is too expensive
  • Instead we commit to \(\textcolor{lightgreen}{\textsf{inc}}(k, j)\)
  • Next: check that value in memory is consistent with the value used in the program
\begin{aligned} \forall(k, j) \quad \textcolor{lightgreen}{\textsf{inc}}(k, j) &= \textcolor{lightgreen}{\textsf{wa}}(k, j) \cdot \left( \textcolor{lightgreen}{\textsf{wv}}(k, j) - \textcolor{lightgreen}{\textsf{val}}(k, j) \right) \end{aligned}
\forall(k, j) \quad \textcolor{lightgreen}{\textsf{val}}(k, j) := \sum_{j' < j} \textcolor{lightgreen}{\textsf{inc}}(k, j') = \sum_{j' \in [\text{ cycles }]} \textcolor{lightgreen}{\textsf{inc}}(k, j') \cdot \textsf{lt}(j', j)

Learnings

  • Currently, jolt spends about 50% time in memory-checking
  • To speed it up: can we use one-hot encoding for memory address?
    • EC commitments are very fast when witness is binary
    • Hash-based commitments are also fast if working over binary tower fields
  • Additionally, permutation args based on quotienting are expensive
    • Can we avoid "large" values in checking permutation?
  • Twist and Shout answers positively to both of these questions

Twist for RAM

Shout for ROM

Twist and Shout - Journal Club

By Suyash Bagad

Twist and Shout - Journal Club

  • 96