Parallel Fitness
Embarrassingly parallel Rastrigin evaluation (POP=4096, DIM=2000)
Sequential Fusion
1000-timestep fused financial simulation (POP=10000)
Matrix Throughput
Parallel 16×16 matrix multiplication throughput
By clicking Run, your GPU model and benchmark results are saved anonymously. No personal information is collected. Privacy policy
The science behind the benchmarks
These benchmarks are based on research demonstrating that fusing sequential fitness evaluations into single GPU compute shader dispatches achieves 159× throughput over PyTorch's per-step dispatch. A native Metal baseline confirms Chrome's browser overhead is only 48% — yet WebGPU still outperforms PyTorch MPS running natively.
Gunaydin, A.B. (2026)
Single-Kernel Fusion for Sequential Fitness Evaluation
via WebGPU Compute Shaders.
doi:10.5281/zenodo.19331834