Vevo2 Models (Mæstræa Mirror)
Singing Voice Synthesis, Conversion & Editing
Original Model by OpenMMLab / Amphion · MIT License
This is a mirror of the Vevo2 model weights for use with Mæstræa AI Workstation. All credits go to the original authors.
What's in This Repo
| Path | Description | Size |
|---|---|---|
contentstyle_modeling/PhoneToVq8192/model.safetensors |
AR model (Qwen2.5-0.5B, ~500M params) | ~2.5 GB |
contentstyle_modeling/Vq32ToVq8192/model.safetensors |
Style transfer model | ~1.5 GB |
acoustic_modeling/Vq8192ToMels/model.safetensors |
Flow matching model (~350M params) | ~1.4 GB |
acoustic_modeling/Vocoder/model*.safetensors |
Vocos vocoder (~250M params) | ~1 GB |
tokenizer/vq32/ |
HuBERT tokenizer (pickle + config) | ~1.3 GB |
tokenizer/vq8192/model.safetensors |
VQ8192 tokenizer | ~200 MB |
Total: ~8 GB
What Vevo2 Does
Vevo2 is a state-of-the-art voice conversion and singing voice synthesis system from the Amphion toolkit. It supports:
- Voice Conversion — Transform vocals to a target voice/timbre
- Singing Voice Synthesis — Generate singing from text + melody
- Speech Editing — Modify speech content while preserving speaker identity
- Zero-Shot TTS — Generate speech in any voice from a short reference
Architecture
- AR Model (Qwen2.5-0.5B) — Autoregressive content-style modeling
- FM Model (~350M) — Flow matching for acoustic generation
- Vocos Vocoder (~250M) — High-quality waveform synthesis
- Total: ~1.1B parameters
VRAM Requirements
| Reference Length | VRAM |
|---|---|
| 15s | ~8 GB |
| 30s | ~10 GB |
| 45s | ~12 GB |
Recommended: Keep reference audio to 15–45 seconds.
Usage with Mæstræa
These models are automatically downloaded by the Mæstræa AI Workstation backend. Place in:
~/.maestraea/models/vevo2/
License
MIT — same as the original Amphion/Vevo2 release.
Credits
- Model: Amphion Vevo2
- Paper: See Amphion repository for citation
- Mirror by: AEmotionStudio
- Downloads last month
- 75
Model tree for AEmotionStudio/vevo2-models
Base model
amphion/Vevo