ProtNHF: Neural Hamiltonian Flows for Controllable Protein Sequence Generation
Authors: Bharath Raghavan¹, David M. Rogers¹
Affiliations:
¹ National Center for Computational Sciences, Oak Ridge National Laboratory
Introduction
ProtNHF is a generative model for protein sequences that enables continuous, controllable design without retraining. It leverages neural Hamiltonian flows with a Transformer-based energy function to map a latent Gaussian to protein embeddings. Sampling-time bias functions allow steering properties like amino acid composition or net charge smoothly and predictably. Generated sequences achieve high quality as measured by ESM-2 pseudo-perplexity and AlphaFold2 pLDDT scores. ProtNHF provides a flexible, physically interpretable framework for programmable protein sequence generation.
The source code is available here: https://github.com/bharath-raghavan/ProtNHF.git
Model Details
This current upload corresponds to model/architecture version 1.
Model Architecture
The following are the model parameters:
dt: 0.05
niter: 4
hidden_dims: 128
std: 0.7
integrator: leapfrog
n_types: 20
energy:
d_model: 320
ff_dim: 1280
n_heads: 20
n_layers: 6
Training
The training was performed using Pytorch DDP on 64*8 GPUs, with a batch size per GPU of 30. The training was performed for 650 epochs. The optimizer and LR scheduler parameter are given below:
lr: 1e-4
betas: [0.9, 0.95]
weight_decay: 0.01
warmup_epochs: 5
Citation
If you use ProtNHF in your research, please cite:
B. Raghavan and D. M. Rogers
ProtNHF: Neural Hamiltonian Flows for Controllable Protein Sequence Generation
arXiv:xxxx.xxxxx (2026)
@article{raghavan2026protnhf,
title = {ProtNHF: Neural Hamiltonian Flows for Controllable Protein Sequence Generation},
author = {Raghavan, Bharath and Rogers, David M.},
journal = {arXiv preprint arXiv:xxxx.xxxxx},
year = {2026}
}
License
ProtNHF code and model weights are licensed under the BSD-3 license.
- Downloads last month
- 56