Papers
arxiv:2602.03120

Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

Published on Feb 3
· Submitted by
Yinggan Xu
on Feb 16
Authors:
,

Abstract

Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine-tuning paradigms, including Reinforcement Learning (RL), fundamentally rely on backpropagation and high-precision weights to compute gradients. Thus they cannot be used on quantized models, where the parameter space is discrete and non-differentiable. While Evolution Strategies (ES) offer a backpropagation-free alternative, optimization of the quantized parameters can still fail due to vanishing or inaccurate gradient. This paper introduces Quantized Evolution Strategies (QES), an optimization paradigm that performs full-parameter fine-tuning directly in the quantized space. QES is based on two innovations: (1) it integrates accumulated error feedback to preserve high-precision gradient signals, and (2) it utilizes a stateless seed replay to reduce memory usage to low-precision inference levels. QES significantly outperforms the state-of-the-art zeroth-order fine-tuning method on arithmetic reasoning tasks, making direct fine-tuning for quantized models possible. It therefore opens up the possibility for scaling up LLMs entirely in the quantized space. The source code is available at https://github.com/dibbla/Quantized-Evolution-Strategies .

Community

Paper author Paper submitter
edited about 14 hours ago

We study an interesting question of whether it is possible to perform full-parameter fine-tuning on Large Language Models (LLMs) directly within the quantized space, effectively bypassing the need for high-precision weights and standard backpropagation.

Usually, Post-Training Quantization (PTQ) renders a model static; you can't easily apply standard fine-tuning or Reinforcement Learning (RL) because the parameter space becomes discrete and non-differentiable. Even standard Evolution Strategies (ES)—which are backprop-free—often struggle here due to vanishing or inaccurate gradients.

We propose a novel solution called Quantized Evolution Strategies (QES). It enables direct optimization of quantized parameters through two key designs:

  • Accumulated Error Feedback: This preserves high-precision gradient signals that would otherwise be lost.
  • Stateless Seed Replay: This keeps memory usage down to low-precision inference levels.

QES significantly outperforming state-of-the-art zeroth-order methods on arithmetic reasoning tasks, and more experiments are on the way. This could be a major step toward playing LLMs entirely in the quantized space!

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.03120 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.03120 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.03120 in a Space README.md to link it from this page.

Collections including this paper 1