Configuration-to-Performance Scaling Law with Neural Ansatz
Paper
โข
2602.10300
โข
Published
This model predicts the performance of neural network configurations using scaling laws. It is trained on the Marin and StepLaw datasets to forecast performance metrics based on model configurations.
NCPL-intermediate (Neural Configuration to Performance Scaling Law - Intermediate) is a specialized forecasting model that:
The model consists of:
Base Model: Qwen/Qwen3-1.7B
Numeric MLP:
Prediction Head:
The model was trained on:
The ScalingLawForecaster class can be found in the GitHub repository.
import torch
from transformers import AutoTokenizer
# Get ScalingLawForecaster from: https://github.com/zhqwqwq/Configuration-to-Performance-Scaling-Law
from model import ScalingLawForecaster
# Load model
model = ScalingLawForecaster(
base_model_name="Qwen/Qwen3-1.7B",
init_from_pretrained=True,
force_fp32=True
)
# Load checkpoint
checkpoint = torch.load("pytorch_model.bin")
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")
# Prepare inputs
# input_ids: tokenized text sequence
# is_number_mask: boolean mask indicating which tokens are numeric
# number_values_filled: actual numeric values (0 for non-numeric tokens)
with torch.no_grad():
predictions = model(
input_ids=input_ids,
is_number_mask=is_number_mask,
number_values_filled=number_values_filled,
attention_mask=attention_mask
)
The model expects three key inputs:
This model is designed for:
If you use this model in your research, please cite:
@article{ncpl2026,
title = {Neural Configuration to Performance Scaling Law},
author = {Huaqing Zhang and Kaiyue Wen and Tengyu Ma},
journal = {arXiv preprint arXiv:2602.10300},
year = {2026},
url = {https://www.arxiv.org/abs/2602.10300}
}