What is the difference between the encoders for Qwen2 audio, Qwen2.5 Omni, Audio Flamingo 3 and Kimi audio?

#1
by mifanbushipeicai - opened

As far as I know, are they all initialized based on Whisper large v3?

In kimi-Audio, whisper encoder is trained. Not freeze. And they use 13 million hours of audio data covering a wide range of modalities including speech, sound, and music.

image

Sign up or log in to comment