HuggingFaceTB/SmolVLM-256M-Instruct Image-Text-to-Text ⢠0.3B ⢠Updated Apr 8, 2025 ⢠362k ⢠349
Running on Zero Agents Featured 1.77k Dia 1.6B šÆ 1.77k Generate realistic dialogue from a script, using Dia!
Paused Agents Featured 229 Spark TTS š 229 A text-to-speech model powered by SparkAudio and Mobvoi.
HuggingFaceTB/SmolVLM2-500M-Video-Instruct Image-Text-to-Text ⢠Updated Apr 8, 2025 ⢠353k ⢠132
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition ⢠6B ⢠Updated Dec 10, 2025 ⢠333k ⢠1.58k
Running Featured 358 Kokoro Text-to-Speech (WebGPU) š£ 358 High-quality speech synthesis powered by Kokoro TTS
mlx-community/SmolVLM2-500M-Video-Instruct-mlx Video-Text-to-Text ⢠Updated Feb 20, 2025 ⢠2.29k ⢠18
Running on Zero Agents Featured 3.59k InstantID š» 3.59k Generate a custom image that keeps your face identity
Runtime error Agents 48 InstructBLIP š 48 Instruction-tuned model for a range of vision-language tasks
Runtime error Agents Featured 33 CLIPnCROP š 33 Extract and crop image sections based on text description