Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.24821

meituan-longcat/LongCat-Flash-Omni

Any-to-Any • 561B • Updated 29 days ago • 162 • 100
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28 • 37
Qwen/Qwen2.5-Omni-3B

Any-to-Any • 6B • Updated Apr 30 • 300k • 311

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Paper • 2510.18019 • Published Oct 20 • 17
PORTool: Tool-Use LLM Training with Rewarded Tree

Paper • 2510.26020 • Published Oct 29 • 4
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

Paper • 2510.24992 • Published Oct 28 • 2
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28 • 37

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Paper • 2504.02821 • Published Apr 3 • 9
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Paper • 2504.17343 • Published Apr 24 • 13
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting

Paper • 2504.15921 • Published Apr 22 • 7
Causal-Copilot: An Autonomous Causal Analysis Agent

Paper • 2504.13263 • Published Apr 17 • 7

💡HF Papers Live 5: Omni-Modal models

meituan-longcat/LongCat-Flash-Omni

Any-to-Any • 561B • Updated 29 days ago • 162 • 100
LongCat-Flash-Omni Technical Report

Paper • 2511.00279 • Published Oct 31 • 22
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17 • 89
nvidia/omnivinci

Feature Extraction • Updated Oct 29 • 6.94k • 163

inclusionAI/Ming-flash-omni-Preview

Any-to-Any • 104B • Updated Oct 30 • 7.94k • 65
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28 • 37
inclusionAI/MingTok-Vision

Image Feature Extraction • 0.7B • Updated Oct 10 • 265 • 31
inclusionAI/Ming-UniVision-16B-A3B

Any-to-Any • 19B • Updated Oct 9 • 44 • 60

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 34

meituan-longcat/LongCat-Flash-Omni

Any-to-Any • 561B • Updated 29 days ago • 162 • 100
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28 • 37
Qwen/Qwen2.5-Omni-3B

Any-to-Any • 6B • Updated Apr 30 • 300k • 311

💡HF Papers Live 5: Omni-Modal models

meituan-longcat/LongCat-Flash-Omni

Any-to-Any • 561B • Updated 29 days ago • 162 • 100
LongCat-Flash-Omni Technical Report

Paper • 2511.00279 • Published Oct 31 • 22
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17 • 89
nvidia/omnivinci

Feature Extraction • Updated Oct 29 • 6.94k • 163

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Paper • 2510.18019 • Published Oct 20 • 17
PORTool: Tool-Use LLM Training with Rewarded Tree

Paper • 2510.26020 • Published Oct 29 • 4
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

Paper • 2510.24992 • Published Oct 28 • 2
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28 • 37

inclusionAI/Ming-flash-omni-Preview

Any-to-Any • 104B • Updated Oct 30 • 7.94k • 65
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28 • 37
inclusionAI/MingTok-Vision

Image Feature Extraction • 0.7B • Updated Oct 10 • 265 • 31
inclusionAI/Ming-UniVision-16B-A3B

Any-to-Any • 19B • Updated Oct 9 • 44 • 60

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Paper • 2504.02821 • Published Apr 3 • 9
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Paper • 2504.17343 • Published Apr 24 • 13
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting

Paper • 2504.15921 • Published Apr 22 • 7
Causal-Copilot: An Autonomous Causal Analysis Agent

Paper • 2504.13263 • Published Apr 17 • 7

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 34

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs