GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization Paper • 2511.15705 • Published 21 days ago • 92
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Paper • 2510.27492 • Published Oct 30 • 81
UME-R1 Collection UME-R1 is a framework designed to endow multimodal embedding models with the flexibility to switch between discriminative and generative embeddings • 4 items • Updated Nov 4 • 8
RoboOmni: Proactive Robot Manipulation in Omni-modal Context Paper • 2510.23763 • Published Oct 27 • 53
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models Paper • 2510.13626 • Published Oct 15 • 45
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations Paper • 2506.04633 • Published Jun 5 • 19
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision Paper • 2505.13427 • Published May 19 • 26
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published May 18 • 24
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision Paper • 2505.13427 • Published May 19 • 26
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published May 18 • 24
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published May 18 • 24 • 2
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published Mar 10 • 61
CoMP: Continual Multimodal Pre-training for Vision Foundation Models Paper • 2503.18931 • Published Mar 24 • 30