WebGPU Bench

Zero-TVM — Run Phi-3 in your browser with 10 hand-written shaders

In Feb 2026 Hugging Face shipped Transformers.js v4 — a C++ WebGPU runtime built with Microsoft's ONNX Runtime team — as the production answer for browser LLM inference. Zero-TVM shows that for Phi-3 Mini specifically, the answer can instead be 10 kernel roles across 27 WGSL files and ~2,000 lines of TypeScript. No compiler, no WASM, no server. Requires WebGPU with shader-f16; first load downloads ~2.1 GB of Q4F16 weights, cached after.

3.8BPhi-3 Mini params
10kernel roles
27WGSL files
228dispatches / token
33 KBgzipped JS
~40 tok/sM2 Pro