Zero-TVM — Run Phi-3 in your browser with 10 hand-written shaders

In Feb 2026 Hugging Face shipped Transformers.js v4 — a C++ WebGPU runtime built with Microsoft's ONNX Runtime team — as the production answer for browser LLM inference. Zero-TVM shows that for Phi-3 Mini specifically, the answer can instead be 10 kernel roles across 27 WGSL files and ~2,000 lines of TypeScript. No compiler, no WASM, no server. Requires WebGPU with shader-f16; first load downloads ~2.1 GB of Q4F16 weights, cached after.

3.8BPhi-3 Mini params

10kernel roles

27WGSL files

228dispatches / token

33 KBgzipped JS

~40 tok/sM2 Pro

zerotvm.comProject site GitHubSource (MIT)DocsHow it works Architecture32-layer pipeline Dispatch viz228 per decode token vs WebLLM22% slower, 10× less code All runsTelemetry across devices