Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Paper ⢠2506.17218 ⢠Published Jun 20, 2025 ⢠29
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence Paper ⢠2506.15677 ⢠Published Jun 18, 2025 ⢠23
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering Paper ⢠2505.23604 ⢠Published May 29, 2025 ⢠23
Towards Understanding Camera Motions in Any Video Paper ⢠2504.15376 ⢠Published Apr 21, 2025 ⢠155
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding Paper ⢠2311.03354 ⢠Published Nov 6, 2023 ⢠7