Regression Language Models for Code
Paper β’ 2509.26476 β’ Published β’ 17
Predicts GPU kernel/operation runtime in milliseconds given source code + GPU hardware specifications.
| Model | RΒ² | RMSE | Spearman Ο | MAPE % |
|---|---|---|---|---|
| GBR | 0.9923 | 0.0728 | 0.9264 | 16.5% |
| RF | 0.9924 | 0.0724 | 0.9277 | 16.3% |
| NN | 0.9932 | 0.0687 | 0.9187 | 17.0% |
| Ensemble | 0.9930 | 0.0693 | 0.9272 | 16.3% |
| GPU | FP32 TFLOPS | Memory BW | VRAM |
|---|---|---|---|
| NVIDIA T4 | 8.1 | 320 GB/s | 16 GB |
| NVIDIA V100 | 15.7 | 900 GB/s | 32 GB |
| NVIDIA A10G | 31.2 | 600 GB/s | 24 GB |
| NVIDIA A100 40GB | 19.5 | 1555 GB/s | 40 GB |
| NVIDIA A100 80GB | 19.5 | 2039 GB/s | 80 GB |
| NVIDIA L4 | 30.3 | 300 GB/s | 24 GB |
| NVIDIA L40S | 91.6 | 864 GB/s | 48 GB |
| NVIDIA RTX 3090 | 35.6 | 936 GB/s | 24 GB |
| NVIDIA RTX 4090 | 82.6 | 1008 GB/s | 24 GB |
| NVIDIA H100 SXM | 67.0 | 3350 GB/s | 80 GB |
| NVIDIA H100 PCIe | 48.0 | 2039 GB/s | 80 GB |
| NVIDIA RTX A6000 | 38.7 | 768 GB/s | 48 GB |
matmul, conv2d, attention, transformer_block, linear, layernorm, batchnorm, softmax, embedding, elementwise, reduction, pooling, FFT, sort, loss+backward
# See the Gradio demo for interactive use
# Or load models directly:
import pickle
with open('model_gbr.pkl', 'rb') as f:
model = pickle.load(f)