Vevo2 Models (Mæstræa Mirror)

Singing Voice Synthesis, Conversion & Editing

Original Model by OpenMMLab / Amphion · MIT License

This is a mirror of the Vevo2 model weights for use with Mæstræa AI Workstation. All credits go to the original authors.

What's in This Repo

Path	Description	Size
`contentstyle_modeling/PhoneToVq8192/model.safetensors`	AR model (Qwen2.5-0.5B, ~500M params)	~2.5 GB
`contentstyle_modeling/Vq32ToVq8192/model.safetensors`	Style transfer model	~1.5 GB
`acoustic_modeling/Vq8192ToMels/model.safetensors`	Flow matching model (~350M params)	~1.4 GB
`acoustic_modeling/Vocoder/model*.safetensors`	Vocos vocoder (~250M params)	~1 GB
`tokenizer/vq32/`	HuBERT tokenizer (pickle + config)	~1.3 GB
`tokenizer/vq8192/model.safetensors`	VQ8192 tokenizer	~200 MB

Total: ~8 GB

Vevo2 is a state-of-the-art voice conversion and singing voice synthesis system from the Amphion toolkit. It supports:

Recommended: Keep reference audio to 15–45 seconds.

These models are automatically downloaded by the Mæstræa AI Workstation backend. Place in:

~/.maestraea/models/vevo2/

MIT — same as the original Amphion/Vevo2 release.

Base model

Finetuned

(1)

this model