Text-to-Speech
ONNX
zero-shot
multilingual

LEMAS-TTS

LEMAS-TTS is a multilingual zero-shot text-to-speech system, presented in the paper LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models.

Model Description

LEMAS-TTS is built upon a non-autoregressive flow-matching framework. It leverages the massive scale and linguistic diversity of the LEMAS-Dataset to achieve robust zero-shot multilingual synthesis. The model incorporates accent-adversarial training and CTC loss to mitigate cross-lingual accent issues, enhancing synthesis stability and quality across diverse languages.

Supported Languages

The model supports 10 major languages for zero-shot synthesis:

  • Chinese (zh)
  • English (en)
  • Spanish (es)
  • Russian (ru)
  • French (fr)
  • German (de)
  • Italian (it)
  • Portuguese (pt)
  • Indonesian (id)
  • Vietnamese (vi)

Training Data

LEMAS-TTS was trained on the LEMAS-Dataset, which is, to our knowledge, currently the largest open-source multilingual speech corpus with word-level timestamps. It covers over 150,000 hours across 10 major languages.

Citation

@article{zhao2026lemas,
  title={LEMAS: A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models},
  author={Zhao, Zhiyuan and Lin, Lijian and Zhu, Ye and Xie, Kai and Liu, Yunfei and Li, Yu},
  journal={arXiv preprint arXiv:2601.04233},
  year={2026}
}
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train LEMAS-Project/LEMAS-TTS

Spaces using LEMAS-Project/LEMAS-TTS 3

Paper for LEMAS-Project/LEMAS-TTS