GPU Runtime Predictor πŸš€βš‘

Predicts GPU kernel/operation runtime in milliseconds given source code + GPU hardware specifications.

How It Works

  1. Code Feature Extraction: Analyzes source code to extract 48 features (tensor dimensions, operation types, complexity indicators)
  2. GPU Feature Encoding: Uses 12 hardware specs (CUDA cores, memory bandwidth, compute capability, etc.)
  3. ML Prediction: Ensemble of Gradient Boosted Trees + Random Forest + Neural Network

Model Comparison

Model R² RMSE Spearman ρ MAPE %
GBR 0.9923 0.0728 0.9264 16.5%
RF 0.9924 0.0724 0.9277 16.3%
NN 0.9932 0.0687 0.9187 17.0%
Ensemble 0.9930 0.0693 0.9272 16.3%

GPU Catalog (12 GPUs)

GPU FP32 TFLOPS Memory BW VRAM
NVIDIA T4 8.1 320 GB/s 16 GB
NVIDIA V100 15.7 900 GB/s 32 GB
NVIDIA A10G 31.2 600 GB/s 24 GB
NVIDIA A100 40GB 19.5 1555 GB/s 40 GB
NVIDIA A100 80GB 19.5 2039 GB/s 80 GB
NVIDIA L4 30.3 300 GB/s 24 GB
NVIDIA L40S 91.6 864 GB/s 48 GB
NVIDIA RTX 3090 35.6 936 GB/s 24 GB
NVIDIA RTX 4090 82.6 1008 GB/s 24 GB
NVIDIA H100 SXM 67.0 3350 GB/s 80 GB
NVIDIA H100 PCIe 48.0 2039 GB/s 80 GB
NVIDIA RTX A6000 38.7 768 GB/s 48 GB

15 Supported Workload Types

matmul, conv2d, attention, transformer_block, linear, layernorm, batchnorm, softmax, embedding, elementwise, reduction, pooling, FFT, sort, loss+backward

Usage

# See the Gradio demo for interactive use
# Or load models directly:
import pickle
with open('model_gbr.pkl', 'rb') as f:
    model = pickle.load(f)

Training

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train RajBhope/gpu-runtime-predictor

Space using RajBhope/gpu-runtime-predictor 1

Papers for RajBhope/gpu-runtime-predictor