ICICLE - PyTorch for ZK

Suyash Bagad

ZK Tokyo 2025

Ingonyama

Challenges with ZKP

  • SNARK (or STARK) proof generation is slow
  • Scope for parallelisability in modern ZKP provers 
\vdots
\underbrace{\hspace{7.57cm}}
N
\mathcal{O}(N)
\mathcal{O}(N \ \text{log}(N))
\ \textsf{Polynomials} \
\textsf{MSM}
\textsf{FFT}
\textsf{Sumcheck}
  • SNARK (or STARK) proof generation is slow
  • Scope for parallelisability in modern ZKP provers
  • Offload intensive compute to the existing GPU infrastructure
\textsf{Device management}
\textsf{CPU-GPU data transfer}
\textsf{Memory management}
\texttt{prover.cpp}
\texttt{prover.rs}
\texttt{prover.go}

Challenges with ZKP

  • SNARK (or STARK) proof generation is slow
  • Scope for parallelisability in modern ZKP provers
  • Offload intensive compute to the existing GPU infrastructure
  • Enter ICICLE
    • A library for developers to implement cutting-edge ZKPs 
    • Super fast and easy to use
    • From proof of concept to production-ready solutions
  • We want devs to write math and leave the acceleration to ICICLE! 

Challenges with ZKP

\text{アイシクル}

ICICLE  アイシクル

  • High-performance cryptographic library for accelerating ZKPs
    • Versatile: Multi-hardware support for diverse environments
    • Efficient: Optimized for ZK computations
    • Scalable: Easy-to-use and scalable solution for developers
  • Optimized primitives like MSM, NTT, Keccak hash, and more
  • Built-in libraries for multiple fields and curves
  • Seamless DevEx with bindings in C++, Rust and Go
  • Backend-agnostic with GPU and CPU support (future Metal? ASICs?)
  • Designed for easy integration and extension

Nooo..to make my ZK prover faster I need to accelerate MSM and NTT for the BN254 curve. I need GPU/CPU coordination and I'll need to learn CUDA while my prover is written in Rust. Ughhh

icicle go brrrr

icicle go brrrr

ICICLE  アイシクル

Architecture

\mathbb{F}

Modular

arithmetic

NTT

Merkle

trees

MSM

G1 & G2

ECNTT

Hashes

Vector

ops

\mathbb{G}

EC Group operations

\mathbb{L}

Linear

Algebra

\mathbb{P}

Polynomial API

(Univariate and Multivariate)

C++

Rust

Go

Credit: Karthik Inbasekar

  • Front-end
    • Multi-language support
    • Abstracts the complexity of working with different backends
  • CUDA Backend
    • Optimized for NVIDIA GPUs
  • Multi-Device Support
  • Custom backend possible

PyTorch and ICICLE

ICICLE in Action: NTT

Host

poly 1

poly 2

poly 10

Device

ICICLE in Action: NTT

Host

poly 1

poly 2

poly 10

Device

\texttt{ntt(}
\texttt{)}

ICICLE in Action: NTT

Host

poly 1

poly 2

poly 10

Device

\texttt{ntt(}
\texttt{)}

ICICLE in Action: NTT

Host

poly 1

poly 2

poly 10

Device

\texttt{ntt(}
\texttt{)}
\texttt{ntt(}
\texttt{)}

ICICLE in Action: NTT

Host

poly 1

poly 2

poly 10

Device

\texttt{ntt(}
\texttt{)}
\texttt{ntt(}
\texttt{)}

ICICLE in Action: NTT

Host

poly 1

poly 2

poly 10

Device

\texttt{ntt(}
\texttt{)}
\texttt{ntt(}
\texttt{)}
\texttt{ntt(}
\texttt{)}

ICICLE in Action: NTT

Host

poly 1

poly 2

poly 10

Device

\texttt{ntt(}
\texttt{)}
\texttt{ntt(}
\texttt{)}
\texttt{ntt(}
\texttt{)}
T_{\textsf{naive}} = 10 \cdot (t_{h2d} + t_{ntt} + t_{d2h})

Can we perform computation and

data transfer simultaneously in icicle?

ICICLE in Action: NTT

Host

poly 1

poly 2

poly 10

Device

One for compute,

one for data transfer

poly 3

ICICLE in Action: NTT

Host

poly 1

poly 2

poly 10

Device

\texttt{ntt(}
\texttt{)}

poly 3

ICICLE in Action: NTT

Host

poly 1

poly 2

poly 10

Device

\texttt{ntt(}
\texttt{)}

poly 3

ICICLE in Action: NTT

Host

poly 1

poly 2

poly 10

Device

\texttt{ntt(}
\texttt{)}

poly 3

\texttt{ntt(}
\texttt{)}

ICICLE in Action: NTT

Host

poly 1

poly 2

poly 10

Device

\texttt{ntt(}
\texttt{)}

poly 3

\texttt{ntt(}
\texttt{)}

ICICLE in Action: NTT

Host

poly 1

poly 2

poly 10

Device

\texttt{ntt(}
\texttt{)}

poly 3

\texttt{ntt(}
\texttt{)}
\texttt{ntt(}
\texttt{)}

ICICLE in Action: NTT

Host

poly 1

poly 2

poly 10

Device

\texttt{ntt(}
\texttt{)}

poly 3

\texttt{ntt(}
\texttt{)}
\texttt{ntt(}
\texttt{)}

How did we read and write simultaneously?

T_{\textsf{opt}} = 10 \cdot t_{ntt}
\therefore \ \frac{T_{\textsf{naive}}}{T_{\textsf{opt}}} \approx 3

ICICLE in Action: NTT

Device

New input

Output

ICICLE in Action: NTT

Device

New input

Output

ICICLE in Action: NTT

Device

New input

Output

ICICLE in Action: NTT

Device

New input

Output

ICICLE in Action: NTT

Device

New input

Output

ICICLE in Action: NTT

Device

New input

Output

ICICLE in Action: NTT

  • Follow instructions from the Google colab file

Roadmap

  • Latest updates (v3.4 and v3.3) include:
    • Improved MSM and NTT performance on CPU
    • New primitives like Blake-3, Koalabear field
  • Work in progress on:
    • Sumcheck as a primitive
    • Other backends like Metal and WebGPU (for client-side proving)

v1

Accelerated primitives

v2

Polynomial API

v3

Multi-platform

Get Involved

  • ICICLE is open-source, contributions are most welcome!
  • Active grant program of $100,000 💵
  • Leave a ⭐ on the ICICLE repo: github.com/ingonyama-zk/icicle

[zkTokyo] Icicle - PyTorch for ZK

By Suyash Bagad

[zkTokyo] Icicle - PyTorch for ZK

  • 106