Linacodec: Highly compressive audio tokenizer for speech models.

Linacodec is an audio tokenizer that compresses audio into just 12.5 tokens per second (171 bps) and decodes to 48khz audio!

Key benefits

Compression: 12.5 tokens/sec (60x more compressed than DAC).
Audio Quality: 48khz output (much clearer then 16khz/24khz which is the standard).
Encoder Speed: 200x realtime.
Decoder Speed: 400x realtime(even faster with batching)
Many Tasks: Indirectly even supports voice conversion, audio super-resolution, and audio denoising!

Why is this even useful?

Audio tokenizers directly contribute to speed, quality, and capability of TTS/ASR models. LinaCodec massively improves upon previous codecs in these areas.

Inference Speed: Enables TTS models to run 800x realtime, 8x faster than MiraTTS!
Fast training: High-quality TTS models can be trained in less then 1 day.
Versatile: Works for both Text-to-Speech and Speech-to-Text unlike most other codecs.

Comparisons

Model	Total Tokens/Sec	Sample Rate
Linacodec	12.5	48khz
DAC	774	44.1khz
EnCodec	300	24khz
Xcodec2	50	16khz
Mimi	200	24khz

Please check the repo for usage: https://github.com/ysharma3501/LinaCodec

Licence is CC-BY-4.0 meaning you can use it for any usecase(commercially/non-commercially) given you credit the original creator. Thank you.

Downloads last month: 100

YatharthS
/

LinaCodec

Linacodec: Highly compressive audio tokenizer for speech models.

Key benefits

Why is this even useful?

Comparisons

Space using YatharthS/LinaCodec 1