Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models Paper • 2512.01949 • Published 10 days ago • 8
Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization Paper • 2511.22586 • Published 15 days ago • 6
Monet: Reasoning in Latent Visual Space Beyond Images and Language Paper • 2511.21395 • Published 16 days ago • 15
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper • 2511.19418 • Published 17 days ago • 27
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models Paper • 2511.14582 • Published 24 days ago • 17
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published 29 days ago • 68
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published 25 days ago • 132
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data Paper • 2511.12609 • Published 26 days ago • 102
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published 28 days ago • 93
Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs Paper • 2511.05933 • Published Nov 8 • 7
Music Flamingo: Scaling Music Understanding in Audio Language Models Paper • 2511.10289 • Published 29 days ago • 10
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published about 1 month ago • 195
The Path Not Taken: RLVR Provably Learns Off the Principals Paper • 2511.08567 • Published about 1 month ago • 32
Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale Paper • 2511.05705 • Published Nov 7 • 6