Research preprint

Your browser just became a supercomputer

We proved that browsers can run GPU computation at near-native speed. No download. No account. No expensive hardware. Just open a page.

GPU computing has a gate

For decades, running computation on a graphics card meant navigating three barriers that kept most of the world locked out.

🔒

Hardware lock-in

CUDA only runs on NVIDIA GPUs. If you have a Mac, an AMD card, or an Intel integrated chip — you're out.

🧩

Software complexity

Python environments, CUDA drivers, cuDNN versions, framework dependencies. One mismatch and nothing works.

💰

Cost

Cloud GPU instances cost $2–4/hour. A university lab without GPU budget simply can't participate.

We proved there's another way

WebGPU is a new browser standard that gives web pages direct access to your graphics card. We showed it's fast enough for real scientific computation.

2,865×
Apple Silicon average 592 real-world devices
623×
Android phones average Qualcomm Adreno
0
things to install just open Chrome

“For fitness functions with sequential dependencies — common in reinforcement learning, financial simulation, and control systems — custom compute shaders dramatically outperform framework-based GPU code.”

From the preprint, Section 4.2

Who this is for

This isn't about replacing data centers. It's about giving GPU access to the people who never had it.

🎓

The student who can't afford a GPU cluster

A computer science student in Lagos, Bangalore, or rural anywhere can now run real GPU-accelerated experiments on their $400 laptop. Open a browser tab, not a grant application.

👩‍🏫

The teacher who wants to show, not just tell

Instead of slides about parallel computing, show it running live in the classroom. Every student's laptop becomes a GPU workstation. No lab setup, no admin permissions.

🔬

The researcher who wants reproducible science

“To reproduce our results, open this URL.” Not “install Python 3.10.12, CUDA 12.1, cuDNN 8.9.7, match the exact driver version.” Reproducibility should be one click.

🚀

The startup that can't justify cloud bills yet

A biotech team in Nairobi running molecular screening. A fintech in Sao Paulo backtesting strategies. Their scientists' laptops ARE the compute cluster. No AWS required.

How it works (simply)

The technical details are in the preprint. Here's the intuition.

1

Your GPU is already powerful

The graphics chip in your laptop — whether it's Apple Silicon, AMD, Intel, or NVIDIA — is a parallel processor with thousands of cores. It spends most of its time idle.

2

Browsers can now talk to it

WebGPU is a new web standard (shipping in Chrome since 2023) that lets web pages run computation directly on your GPU. Not just graphics — real number crunching.

3

We figured out how to make it fast

Instead of sending thousands of small tasks to the GPU one by one (like PyTorch does), we pack the entire computation into a single instruction. One dispatch instead of 22,500. The browser only adds 48% overhead vs native, and it's still faster than PyTorch.

4

We proved it with real benchmarks

30 independent runs per experiment. Statistical tests. Comparisons against 8 systems on 2 hardware platforms. Not hype — evidence.

What actually changes

This isn't theoretical. Here's what's different tomorrow.

Before

“Reproduce our results” → Install Python 3.10.12 → Install CUDA 12.1 → Install cuDNN 8.9.7 → Match driver version → Debug for 3 hours → Maybe it works

After

Click this link. Results run in your browser. Verified in 30 seconds.

Before

“I need GPU compute” → AWS account → GPU instance ($2–4/hr) → DevOps setup → $5,000/month bill → Need funding before building

After

Users' own GPUs do the work. Server cost: $0. Ship on day one.

Before

“Today we'll learn parallel computing” → Show slides → Students nod → Nobody runs anything because the school has no GPU lab

After

“Open this URL on your laptop.” 30 students run GPU computation simultaneously. They see it, touch it, modify it.

Before

“Run screening on patient data” → 6-month legal review to upload to cloud → HIPAA/GDPR audit → $200K contract → Finally start work

After

Data never leaves the laptop. GPU compute runs in the browser. Compliance by architecture, not by contract.

The result that surprised us

The paper measured 159–720× on two machines. Then 592 people ran it on their own devices, and the numbers got bigger.

In the paper, we tested across four GPU APIs on two hardware platforms. On a Tesla T4: hand-fused CUDA 720×, JAX lax.scan 172×, Triton 27×. On an M2 Pro: WebGPU 159× over PyTorch MPS. The pattern was consistent: fusion eliminates dispatch overhead.

Since publishing, 592 devices have confirmed this — and the real-world numbers are larger. Apple Silicon averages 2,865×. Qualcomm Adreno (the chip in most Android phones) averages 623×. NVIDIA desktops average 79×.

Why are the real-world numbers bigger?

The papers measured on 2 machines: an Apple M2 Pro laptop and a Tesla T4 server. Both are fast GPUs with efficient command dispatching. That's why the speedups were “only” 159–720×.

Real-world devices include phones, tablets, Chromebooks, and laptops with integrated GPUs — hardware that was never designed for compute workloads. These GPUs have much worse dispatch overhead.

Kernel fusion eliminates dispatch overhead. So the worse a device is at dispatching, the more it benefits. NVIDIA desktop GPUs (good dispatching) see ~79×. Apple Silicon laptops see ~2,865×. Android phones see ~623×. The devices that need fusion most, benefit from it most.

See it for yourself

Run real GPU benchmarks on your hardware, right now, in your browser.

Every result from every device is public. No cherry-picking. Verify any claim yourself.