Collections
Discover the best community collections!
Collections including paper arxiv:2510.24821
-
Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution
Paper • 2510.18019 • Published • 17 -
PORTool: Tool-Use LLM Training with Rewarded Tree
Paper • 2510.26020 • Published • 4 -
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model
Paper • 2510.24992 • Published • 2 -
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Paper • 2510.24821 • Published • 37
-
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Paper • 2504.02821 • Published • 9 -
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Paper • 2504.17343 • Published • 13 -
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting
Paper • 2504.15921 • Published • 7 -
Causal-Copilot: An Autonomous Causal Analysis Agent
Paper • 2504.13263 • Published • 7
-
meituan-longcat/LongCat-Flash-Omni
Any-to-Any • 561B • Updated • 162 • 100 -
LongCat-Flash-Omni Technical Report
Paper • 2511.00279 • Published • 22 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 89 -
nvidia/omnivinci
Feature Extraction • Updated • 6.94k • 163
-
inclusionAI/Ming-flash-omni-Preview
Any-to-Any • 104B • Updated • 7.94k • 65 -
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Paper • 2510.24821 • Published • 37 -
inclusionAI/MingTok-Vision
Image Feature Extraction • 0.7B • Updated • 265 • 31 -
inclusionAI/Ming-UniVision-16B-A3B
Any-to-Any • 19B • Updated • 44 • 60
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34
-
meituan-longcat/LongCat-Flash-Omni
Any-to-Any • 561B • Updated • 162 • 100 -
LongCat-Flash-Omni Technical Report
Paper • 2511.00279 • Published • 22 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 89 -
nvidia/omnivinci
Feature Extraction • Updated • 6.94k • 163
-
Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution
Paper • 2510.18019 • Published • 17 -
PORTool: Tool-Use LLM Training with Rewarded Tree
Paper • 2510.26020 • Published • 4 -
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model
Paper • 2510.24992 • Published • 2 -
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Paper • 2510.24821 • Published • 37
-
inclusionAI/Ming-flash-omni-Preview
Any-to-Any • 104B • Updated • 7.94k • 65 -
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Paper • 2510.24821 • Published • 37 -
inclusionAI/MingTok-Vision
Image Feature Extraction • 0.7B • Updated • 265 • 31 -
inclusionAI/Ming-UniVision-16B-A3B
Any-to-Any • 19B • Updated • 44 • 60
-
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Paper • 2504.02821 • Published • 9 -
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Paper • 2504.17343 • Published • 13 -
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting
Paper • 2504.15921 • Published • 7 -
Causal-Copilot: An Autonomous Causal Analysis Agent
Paper • 2504.13263 • Published • 7
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34