Twist and Shout

Evolution of SNARKs

Fast prover
Smallest proof
Very fast verifier

Circuit trusted-setup

Transparent
Acceptable (log) proof size
Great for range checks

Slow (linear) verifier
Not-so-fast prover

Fast prover
Very small proof
Fast verifier
Universal trusted setup

FFTs

No FFTs
High-degree custom gates
Sumcheck-based

Permutation check

Fully multi-linear
Sparse permutation check
Sparse commitments

Commitment key size?

2016

Groth'16

Bootle / Bulletproofs

2018

2019

Plonk

2022

HyperPlonk

2025

Twist & Shout

RISCV Example

.section .data
array:  .word 0, 0, 0, 0  # Initially, memory is all zeros

.section .text
.global _start

_start:
    # Step 1: Load base address of array
    la t0, array  

    # Step 2: Store Initial Values Explicitly
    li t1, 10     # Load immediate 10 into t1
    li t2, 20     # Load immediate 20 into t2
    sw t1, 0(t0)  # Store t1 (10) into array[0]
    sw t2, 4(t0)  # Store t2 (20) into array[1]

    # Step 3: Explicitly Load Values Back into Registers
    lw t1, 0(t0)  # Load array[0] (10) into t1
    lw t2, 4(t0)  # Load array[1] (20) into t2

    # Step 4: Perform Arithmetic Operations
    add t3, t1, t2  # t3 = t1 + t2 (10 + 20 = 30)
    mul t4, t1, t2  # t4 = t1 * t2 (10 * 20 = 200)

    # Step 5: Store Computation Results in Memory
    sw t3, 8(t0)   # Store sum at array[2]
    sw t4, 12(t0)  # Store multiplication result at array[3]

    # Step 6: Exit Program
    li a7, 10      # syscall for exit
    ecall

a = [0, \ 0, \ 0, \ 0, \ 0]

a = [\textcolor{orange}{10}, \ \textcolor{orange}{20}, \ 0, \ 0, \ 0]

a = [10, \ 20, \ \textcolor{orange}{30}, \ \textcolor{orange}{10}, \ \textcolor{orange}{200}]

t_1 = 10, \ t_2 = 20

\begin{aligned} t_3 &= t_1 + t_2 \\ t_4 &= t_1 \times t_2 \end{aligned}

Memory-checking Protocols

Write 10

\textcolor{red}{\textsf{rv}(j)} = \textcolor{lightgreen}{\textsf{wv}(j')}

\text{s.t. } j' < j

Prove: these two are equal

Memory-checking Protocols

address	value
0x2000	0
0x2004	0
0x2008	0
0x200C	0

\(\text{Memory}\)

Memory-checking Protocols

address	value
0x2000	0
0x2004	0
0x2008	0
0x200C	0

\(\text{Memory}\)

user-input

Memory-checking Protocols

\(\text{Memory}\)

user-input

address	value
0x2000	0
0x2004	0
0x2008	0
0x200C	0

Memory-checking Protocols

\(\text{Memory}\)

address	value
0x2000	0
0x2004	0
0x2008	0
0x200C	0

store 10

Memory-checking Protocols

\(\text{Memory}\)

address	value
0x2000	10
0x2004	0
0x2008	0
0x200C	0

store 20

Memory-checking Protocols

\(\text{Memory}\)

address	value
0x2000	10
0x2004	20
0x2008	0
0x200C	0

Memory-checking Protocols

\(\text{Memory}\)

address	value
0x2000	10
0x2004	20
0x2008	0
0x200C	0

load 10

Memory-checking Protocols

\(\text{Memory}\)

address	value
0x2000	10
0x2004	20
0x2008	0
0x200C	0

load 20

Memory-checking Protocols

\(\text{Memory}\)

address	value
0x2000	10
0x2004	20
0x2008	0
0x200C	0

Memory-checking Protocols

\(\text{Memory}\)

address	value
0x2000	10
0x2004	20
0x2008	0
0x200C	0

Memory-checking Protocols

\(\text{Memory}\)

address	value
0x2000	10
0x2004	20
0x2008	0
0x200C	0

store 30

Memory-checking Protocols

\(\text{Memory}\)

address	value
0x2000	10
0x2004	20
0x2008	30
0x200C	0

store 200

Memory-checking Protocols

\(\text{Memory}\)

address	value
0x2000	10
0x2004	20
0x2008	30
0x200C	200

store 200

Memory-checking Protocols

\(\text{Memory}\)

address	value
0x2000	10
0x2004	20
0x2008	30
0x200C	200

Prove the reads and writes to and from memory were performed correctly.

🎯

Trace Reordering

writes 10

Trace Reordering

writes 10

writes 20

\(\therefore\) Reordered memory operations are a permutation of the original memory operation!

Spice: Reordering-free approach

(0\text{x}2000, \ \textcolor{orange}{10}, \ 6)

(0\text{x}2004, \ \textcolor{orange}{20}, \ 7)

(0\text{x}2000, \ \textcolor{orange}{10}, \ 6)

(0\text{x}2004, \ \textcolor{orange}{20}, \ 7)

(0\text{x}2008, \ \textcolor{orange}{30}, \ 10)

(0\text{x200C}, \ \textcolor{orange}{200}, \ 11)

\text{Read streams}

\text{Write streams}

(0\text{x}2000, \ \textcolor{orange}{10}, \ 4)

(0\text{x}2004, \ \textcolor{orange}{20}, \ 5)

(0\text{x}2000, \ \textcolor{orange}{10}, \ 4)

(0\text{x}2004, \ \textcolor{orange}{20}, \ 5)

(0\text{x}2008, \ \textcolor{orange}{30}, \ 10)

(0\text{x200C}, \ \textcolor{orange}{200}, \ 11)

Source: Proving the correct execution of concurrent services in zero-knowledge, IACR 2018/907

Spice: Reordering-free approach

(0\text{x}2000, \ \textcolor{orange}{10}, \ 6)

(0\text{x}2004, \ \textcolor{orange}{20}, \ 7)

(0\text{x}2000, \ \textcolor{orange}{10}, \ 6)

(0\text{x}2004, \ \textcolor{orange}{20}, \ 7)

(0\text{x}2008, \ \textcolor{orange}{30}, \ 10)

(0\text{x200C}, \ \textcolor{orange}{200}, \ 11)

\text{Read streams}

\text{Write streams}

(0\text{x}2000, \ \textcolor{orange}{10}, \ 4)

(0\text{x}2004, \ \textcolor{orange}{20}, \ 5)

(0\text{x}2000, \ \textcolor{orange}{10}, \ 4)

(0\text{x}2004, \ \textcolor{orange}{20}, \ 5)

(0\text{x}2008, \ \textcolor{orange}{30}, \ 10)

(0\text{x200C}, \ \textcolor{orange}{200}, \ 11)

If the local constraints are valid AND
If read stream is a permutation of the write stream
Then, memory is consistent!
Problem
- Read and write sets grow very big \(\implies\) linear work for prover
Solution
- Use multi-set hashing

\textsf{MSH}(S_1 \cup S_2) \equiv \textsf{MSH}(S_1) + \textsf{MSH}(S_2)

Shout for ROM

address	value
0x2000	10
0x2004	20
0x2008	30
0x200C	200
...	...
...	...

\(\text{ROM}\)

Use one-hot encoding for addresses
PCS and sum-check work well with sparsity
\(T:\) total number of cycles in the program
\(K:\) total memory size
⚠️ Commit to \(\mathcal{O}(K \cdot T)\) witness?
- EC vs hash-based PCS
\(\textsf{ra}(k, j):\) \(k\)-th bit of the address read at cycle \(j\)

\implies \textcolor{violet}{\textsf{rv}}(j) = \sum_{k \in [\ \text{rom table} \ ] } \textcolor{violet}{\textsf{ra}}(k, j) \cdot \textsf{val}(k)

RHS is multi-linear, just check at a random point
Still need to prove correctness of one-hot encoding

0\ldots000\textcolor{red}{1}

0\dots00\textcolor{red}{1}0

0\ldots0\textcolor{red}{1}00

\textcolor{red}{1}\ldots0000

encoding

...

0\ldots\textcolor{red}{1}000

\underbrace{\hspace{2.9cm}}

Shout for ROM

address	value
0x2000	10
0x2004	20
0x2008	30
0x200C	200
...	...
...	...

\(\text{ROM}\)

Still need to prove correctness of one-hot encoding
- \(\textcolor{violet}{\textsf{ra}}(k, j) \in \{0, 1\}\) for all \(k, j\)
- \(\textcolor{violet}{\textsf{ra}}(k, j)\) is \(1\) for exactly one \(k\)

\underbrace{\hspace{2.9cm}}

\implies \textcolor{violet}{\textsf{ra}}(k, j) \cdot (\textcolor{violet}{\textsf{ra}}(k, j) - 1) = 0

\implies \sum_{k \in [\text{ rom table }]} \textcolor{violet}{\textsf{ra}}(k, j) = 1

0\ldots000\textcolor{red}{1}

0\dots00\textcolor{red}{1}0

0\ldots0\textcolor{red}{1}00

\textcolor{red}{1}\ldots0000

encoding

...

0\ldots\textcolor{red}{1}000

Prover commits to \((K-1)\) 0's and one 1 per read
⚠️ For a very large table \(K \approx 2^{64}\), commitment key size is \(K \times T\) 😱
Hint: use a slightly more efficient encoding!

Shout for ROM

address	value
0x2000	10
0x2004	20
0x2008	30
0x200C	200
...	...
...	...

\(\text{ROM}\)

\underbrace{\hspace{2.9cm}}

0\ldots000\textcolor{red}{1}

0\dots00\textcolor{red}{1}0

0\ldots0\textcolor{red}{1}00

\textcolor{red}{1}\ldots0000

one-hot

...

0\ldots\textcolor{red}{1}000

Hint: use a slightly more efficient encoding!
Suppose \(K=16,\) then addr at index 6:
- one-hot: \(000000000\textcolor{red}{1}000000\)
- 2d one-hot: \((0\textcolor{red}{1}00, \ 00\textcolor{red}{1}0)\)

\begin{bmatrix} 0 \\ \textcolor{red}{1} \\ 0 \\ 0 \end{bmatrix} \otimes \begin{bmatrix} 0 \\ 0 \\ \textcolor{red}{1} \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \cdot \begin{bmatrix} 0 \\ 0 \\ \textcolor{red}{1} \\ 0 \end{bmatrix} \\ \textcolor{red}{1} \cdot \begin{bmatrix} 0 \\ 0 \\ \textcolor{red}{1} \\ 0 \end{bmatrix} \\ 0 \cdot \begin{bmatrix} 0 \\ 0 \\ \textcolor{red}{1} \\ 0 \end{bmatrix} \\ 0 \cdot \begin{bmatrix} 0 \\ 0 \\ \textcolor{red}{1} \\ 0 \end{bmatrix} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \textcolor{red}{1} \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}

2d one-hot

0...00\textcolor{red}{1}

0...0\textcolor{red}{1}0

0...00\textcolor{red}{1}

0...\textcolor{red}{1}00

0...00\textcolor{red}{1}

...

\textcolor{red}{1}...000

\underbrace{\hspace{2cm}}

K^{\frac{1}{2}}

\underbrace{\hspace{2cm}}

K^{\frac{1}{2}}

\equiv (2, 1)_4

Shout for ROM

address	value
0x2000	10
0x2004	20
0x2008	30
0x200C	200
...	...
...	...

\(\text{ROM}\)

\underbrace{\hspace{2.9cm}}

0\ldots000\textcolor{red}{1}

0\dots00\textcolor{red}{1}0

0\ldots0\textcolor{red}{1}00

\textcolor{red}{1}\ldots0000

one-hot

...

0\ldots\textcolor{red}{1}000

Hint: use a slightly more efficient encoding!
Suppose \(K=16,\) then addr at index 6:
- one-hot: \(000000000\textcolor{red}{1}000000\)
- 2d one-hot: \((0\textcolor{red}{1}00, \ 00\textcolor{red}{1}0)\)

2d one-hot

0...00\textcolor{red}{1}

0...0\textcolor{red}{1}0

0...00\textcolor{red}{1}

0...\textcolor{red}{1}00

0...00\textcolor{red}{1}

...

\textcolor{red}{1}...000

\underbrace{\hspace{2cm}}

K^{\frac{1}{2}}

\underbrace{\hspace{2cm}}

K^{\frac{1}{2}}

\equiv (2, 1)_4

KZG commitment key size: \(K^{\frac{1}{2}} \cdot T\)
- For jolt, \(K=2^{64},\) so key size: \(2^{32}\)
Can generalise with parameter \(d\): \(K^{\frac{1}{d}}\)
Trade-off: prover needs to commit to \(d \cdot K^{\frac{1}{d}}\) ones

Shout for ROM

address	value
0x2000	10
0x2004	20
0x2008	30
0x200C	200
...	...
...	...

\(\text{ROM}\)

Shout for general \(d\):
- Read address vectors: \(\textcolor{violet}{\textsf{ra}_1}(k_1, j), \ \textcolor{violet}{\textsf{ra}_2}(k_2, j)\)

2d one-hot

0...00\textcolor{red}{1}

0...0\textcolor{red}{1}0

0...00\textcolor{red}{1}

0...\textcolor{red}{1}00

0...00\textcolor{red}{1}

...

\textcolor{red}{1}...000

\underbrace{\hspace{2cm}}

K^{\frac{1}{2}}

\underbrace{\hspace{2cm}}

K^{\frac{1}{2}}

To prove correctness of 2d one-hot encoding
- \(\textcolor{violet}{\textsf{ra}_i}(k_i, j) \in \{0, 1\}\) for all \(i, k_i, j\)
- \(\textcolor{violet}{\textsf{ra}_i}(k_i, j)\) is \(1\) for exactly one \(k_i\)

\implies \textcolor{violet}{\textsf{ra}_i}(k_i, j) \cdot (\textcolor{violet}{\textsf{ra}_i}(k_i, j) - 1) = 0

\implies \sum_{k_i} \textcolor{violet}{\textsf{ra}_i}(k_i, j) = 1

\implies \textcolor{violet}{\textsf{rv}}(j) = \sum_{k_1} \sum_{k_2} \textcolor{violet}{\textsf{ra}_1}(k_1, j) \cdot \textcolor{violet}{\textsf{ra}_2}(k_2, j) \cdot \textsf{val}((k_2 \ \| \ k_1))

Twist for RAM

Prove correctness of reads AND writes
Need to commit to memory state: not sparse ⚠️
Lets look at the changes in memory state!

Twist for RAM

Idea: commit to \(\Delta\)-memory \(\textcolor{lightgreen}{\textsf{inc}}(k, j)\) but prove statements about \(\textcolor{lightgreen}{\textsf{val}}(k, j)\)

\begin{aligned} \textcolor{lightgreen}{\textsf{inc}}(k, j) &= \textcolor{lightgreen}{\textsf{val}}(k, j + 1) - \textcolor{lightgreen}{\textsf{val}}(k, j) \\[3pt] &= \textcolor{lightgreen}{\textsf{wa}}(k, j) \cdot \left( \textcolor{lightgreen}{\textsf{wv}}(k, j) - \textcolor{lightgreen}{\textsf{val}}(k, j) \right) \end{aligned}

Twist for RAM

Lets summarise the relations we want to prove:

\implies \textcolor{violet}{\textsf{rv}}(j) = \sum_{k \in [\ \text{memory} \ ] } \textcolor{violet}{\textsf{ra}}(k, j) \cdot \textcolor{lightgreen}{\textsf{val}}(k, j)

address	value
0x2000	10
0x2004	20
0x2008	30
0x200C	200
...	...
...	...

\(\text{RAM}\)

0\ldots000\textcolor{red}{1}

0\dots00\textcolor{red}{1}0

0\ldots0\textcolor{red}{1}00

\textcolor{red}{1}\ldots0000

encoding

...

0\ldots\textcolor{red}{1}000

\underbrace{\hspace{2.9cm}}

But committing to \(\textcolor{lightgreen}{\textsf{val}}(k, j)\) is too expensive
Instead we commit to \(\textcolor{lightgreen}{\textsf{inc}}(k, j)\)
Next: check that value in memory is consistent with the value used in the program

\begin{aligned} \forall(k, j) \quad \textcolor{lightgreen}{\textsf{inc}}(k, j) &= \textcolor{lightgreen}{\textsf{wa}}(k, j) \cdot \left( \textcolor{lightgreen}{\textsf{wv}}(k, j) - \textcolor{lightgreen}{\textsf{val}}(k, j) \right) \end{aligned}

\forall(k, j) \quad \textcolor{lightgreen}{\textsf{val}}(k, j) := \sum_{j' < j} \textcolor{lightgreen}{\textsf{inc}}(k, j') = \sum_{j' \in [\text{ cycles }]} \textcolor{lightgreen}{\textsf{inc}}(k, j') \cdot \textsf{lt}(j', j)

Learnings

Currently, jolt spends about 50% time in memory-checking
To speed it up: can we use one-hot encoding for memory address?
- EC commitments are very fast when witness is binary
- Hash-based commitments are also fast if working over binary tower fields
Additionally, permutation args based on quotienting are expensive
- Can we avoid "large" values in checking permutation?
Twist and Shout answers positively to both of these questions

Twist and Shout

Evolution of SNARKs

RISCV Example

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Trace Reordering

Trace Reordering

Spice: Reordering-free approach

Spice: Reordering-free approach

Shout for ROM

Shout for ROM

Shout for ROM

Shout for ROM

Shout for ROM

Twist for RAM

Twist for RAM

Twist for RAM

Twist for RAM

Learnings

Twist for RAM

Shout for ROM

Twist and Shout - Journal Club

Twist and Shout - Journal Club

Suyash Bagad

Twist and Shout

Evolution of SNARKs

RISCV Example

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Memory-checking Protocols

Trace Reordering

Trace Reordering

Spice: Reordering-free approach

Spice: Reordering-free approach

Shout for ROM

Shout for ROM

Shout for ROM

Shout for ROM

Shout for ROM

Twist for RAM

Twist for RAM

Twist for RAM

Twist for RAM

Learnings

Twist for RAM

Shout for ROM

Twist and Shout - Journal Club

More from Suyash Bagad