ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems Paper • 2503.20756 • Published Mar 26, 2025 • 7
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14, 2025 • 99
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 216
CaptionQA: Is Your Caption as Useful as the Image Itself? Paper • 2511.21025 • Published Nov 26, 2025 • 28
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning Paper • 2602.12099 • Published Feb 12 • 60
PyVision-RL: Forging Open Agentic Vision Models via RL Paper • 2602.20739 • Published 29 days ago • 31
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published 15 days ago • 52
Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding Paper • 2603.13366 • Published 15 days ago • 93