Vevo2 Models (Mæstræa Mirror)

Singing Voice Synthesis, Conversion & Editing

Original Model by OpenMMLab / Amphion · MIT License

This is a mirror of the Vevo2 model weights for use with Mæstræa AI Workstation. All credits go to the original authors.

What's in This Repo

Path Description Size
contentstyle_modeling/PhoneToVq8192/model.safetensors AR model (Qwen2.5-0.5B, ~500M params) ~2.5 GB
contentstyle_modeling/Vq32ToVq8192/model.safetensors Style transfer model ~1.5 GB
acoustic_modeling/Vq8192ToMels/model.safetensors Flow matching model (~350M params) ~1.4 GB
acoustic_modeling/Vocoder/model*.safetensors Vocos vocoder (~250M params) ~1 GB
tokenizer/vq32/ HuBERT tokenizer (pickle + config) ~1.3 GB
tokenizer/vq8192/model.safetensors VQ8192 tokenizer ~200 MB

Total: ~8 GB

What Vevo2 Does

Vevo2 is a state-of-the-art voice conversion and singing voice synthesis system from the Amphion toolkit. It supports:

  • Voice Conversion — Transform vocals to a target voice/timbre
  • Singing Voice Synthesis — Generate singing from text + melody
  • Speech Editing — Modify speech content while preserving speaker identity
  • Zero-Shot TTS — Generate speech in any voice from a short reference

Architecture

  • AR Model (Qwen2.5-0.5B) — Autoregressive content-style modeling
  • FM Model (~350M) — Flow matching for acoustic generation
  • Vocos Vocoder (~250M) — High-quality waveform synthesis
  • Total: ~1.1B parameters

VRAM Requirements

Reference Length VRAM
15s ~8 GB
30s ~10 GB
45s ~12 GB

Recommended: Keep reference audio to 15–45 seconds.

Usage with Mæstræa

These models are automatically downloaded by the Mæstræa AI Workstation backend. Place in:

~/.maestraea/models/vevo2/

License

MIT — same as the original Amphion/Vevo2 release.

Credits

Downloads last month
75
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AEmotionStudio/vevo2-models

Base model

amphion/Vevo
Finetuned
(1)
this model