ProtNHF: Neural Hamiltonian Flows for Controllable Protein Sequence Generation

Authors: Bharath Raghavan¹, David M. Rogers¹

Affiliations:
¹ National Center for Computational Sciences, Oak Ridge National Laboratory

Introduction

ProtNHF is a generative model for protein sequences that enables continuous, controllable design without retraining. It leverages neural Hamiltonian flows with a Transformer-based energy function to map a latent Gaussian to protein embeddings. Sampling-time bias functions allow steering properties like amino acid composition or net charge smoothly and predictably. Generated sequences achieve high quality as measured by ESM-2 pseudo-perplexity and AlphaFold2 pLDDT scores. ProtNHF provides a flexible, physically interpretable framework for programmable protein sequence generation.

The source code is available here: https://github.com/bharath-raghavan/ProtNHF.git

Model Details

This current upload corresponds to model/architecture version 1.

Model Architecture

The following are the model parameters:

  dt: 0.05
  niter: 4
  hidden_dims: 128
  std: 0.7
  integrator: leapfrog
  n_types: 20
  energy:
    d_model: 320
    ff_dim: 1280
    n_heads: 20
    n_layers: 6

Training

The training was performed using Pytorch DDP on 64*8 GPUs, with a batch size per GPU of 30. The training was performed for 650 epochs. The optimizer and LR scheduler parameter are given below:

  lr: 1e-4
  betas: [0.9, 0.95]
  weight_decay: 0.01
  warmup_epochs: 5

Citation

If you use ProtNHF in your research, please cite:

B. Raghavan and D. M. Rogers
ProtNHF: Neural Hamiltonian Flows for Controllable Protein Sequence Generation
arXiv:xxxx.xxxxx (2026)

@article{raghavan2026protnhf,
  title   = {ProtNHF: Neural Hamiltonian Flows for Controllable Protein Sequence Generation},
  author  = {Raghavan, Bharath and Rogers, David M.},
  journal = {arXiv preprint arXiv:xxxx.xxxxx},
  year    = {2026}
}

License

ProtNHF code and model weights are licensed under the BSD-3 license.

Downloads last month
56
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support