Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.20215

RoboOmni: Proactive Robot Manipulation in Omni-modal Context

Paper • 2510.23763 • Published Oct 27 • 53
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17 • 89
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22 • 139
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

Paper • 2510.13747 • Published Oct 15 • 29

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
totally-not-an-llm/EverythingLM-13b-16k

Text Generation • Updated Apr 23, 2024 • 2.14k • 33
llava-hf/llava-v1.6-mistral-7b-hf

Image-Text-to-Text • 8B • Updated May 1 • 371k • 297

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 263
DINOv3

Paper • 2508.10104 • Published Aug 13 • 285

Vision Language Models: 2025 Update

This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update

Qwen/Qwen2.5-Omni-7B

Any-to-Any • 11B • Updated Apr 30 • 139k • 1.83k
Running

Featured

363

Qwen2.5 Omni 7B Demo

🏆

363

Generate text and speech from text, audio, images, and videos
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Oct 5 • 97.2k • 1.27k

Llammy3.2-3B-GUFF

prithivMLmods/Llama-Sentient-3.2-3B-Instruct

Text Generation • Updated Dec 10, 2024 • 22 • 9
bartendr604/Llama.Diffusion.Flix

Updated Apr 12 • 1
Running

1.42k

FLUX Unlimited

🔥

1.42k

Use the FLUX model as much as you want.
HKUSTAudio/xcodec2

Audio-to-Audio • 0.8B • Updated Feb 23 • 24.9k • 91

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 494
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Paper • 2510.07499 • Published Oct 8 • 48
Improving Context Fidelity via Native Retrieval-Augmented Reasoning

Paper • 2509.13683 • Published Sep 17 • 8
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering

Paper • 2509.00798 • Published Aug 31 • 1

Voice2Voice models

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166

Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 37
Qwen2.5-1M Technical Report

Paper • 2501.15383 • Published Jan 26 • 72
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376
Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 152

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Paper • 2308.01390 • Published Aug 2, 2023 • 33
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166

My reading list

Papers that I've read or want to read

OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts

Paper • 2503.22952 • Published Mar 29 • 17
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166

RoboOmni: Proactive Robot Manipulation in Omni-modal Context

Paper • 2510.23763 • Published Oct 27 • 53
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17 • 89
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22 • 139
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

Paper • 2510.13747 • Published Oct 15 • 29

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 494
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Paper • 2510.07499 • Published Oct 8 • 48
Improving Context Fidelity via Native Retrieval-Augmented Reasoning

Paper • 2509.13683 • Published Sep 17 • 8
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering

Paper • 2509.00798 • Published Aug 31 • 1

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
totally-not-an-llm/EverythingLM-13b-16k

Text Generation • Updated Apr 23, 2024 • 2.14k • 33
llava-hf/llava-v1.6-mistral-7b-hf

Image-Text-to-Text • 8B • Updated May 1 • 371k • 297

Voice2Voice models

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 263
DINOv3

Paper • 2508.10104 • Published Aug 13 • 285

Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 37
Qwen2.5-1M Technical Report

Paper • 2501.15383 • Published Jan 26 • 72
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376
Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 152

Vision Language Models: 2025 Update

This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update

Qwen/Qwen2.5-Omni-7B

Any-to-Any • 11B • Updated Apr 30 • 139k • 1.83k
Running

Featured

363

Qwen2.5 Omni 7B Demo

🏆

363

Generate text and speech from text, audio, images, and videos
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
openbmb/MiniCPM-o-2_6

Any-to-Any • 9B • Updated Oct 5 • 97.2k • 1.27k

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Paper • 2308.01390 • Published Aug 2, 2023 • 33
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166

Llammy3.2-3B-GUFF

prithivMLmods/Llama-Sentient-3.2-3B-Instruct

Text Generation • Updated Dec 10, 2024 • 22 • 9
bartendr604/Llama.Diffusion.Flix

Updated Apr 12 • 1
Running

1.42k

FLUX Unlimited

🔥

1.42k

Use the FLUX model as much as you want.
HKUSTAudio/xcodec2

Audio-to-Audio • 0.8B • Updated Feb 23 • 24.9k • 91

My reading list

Papers that I've read or want to read

OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts

Paper • 2503.22952 • Published Mar 29 • 17
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166

Previous
1
2
3
4
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs