WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation Paper β’ 2508.16763 β’ Published Aug 22 β’ 2
BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning Paper β’ 2508.09804 β’ Published Aug 13
Scope: Selective Cross-modal Orchestration of Visual Perception Experts Paper β’ 2510.12974 β’ Published Oct 14
Grounding Computer Use Agents on Human Demonstrations Paper β’ 2511.07332 β’ Published Nov 10 β’ 105
Rendering-Aware Reinforcement Learning for Vector Graphics Generation Paper β’ 2505.20793 β’ Published May 27 β’ 12
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation Paper β’ 2407.06423 β’ Published Jul 8, 2024
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction Paper β’ 2503.15661 β’ Published Mar 19 β’ 2
StarFlow: Generating Structured Workflow Outputs From Sketch Images Paper β’ 2503.21889 β’ Published Mar 27 β’ 2
Rendering-Aware Reinforcement Learning for Vector Graphics Generation Paper β’ 2505.20793 β’ Published May 27 β’ 12
Rendering-Aware Reinforcement Learning for Vector Graphics Generation Paper β’ 2505.20793 β’ Published May 27 β’ 12
Distilling semantically aware orders for autoregressive image generation Paper β’ 2504.17069 β’ Published Apr 23 β’ 7
Distilling semantically aware orders for autoregressive image generation Paper β’ 2504.17069 β’ Published Apr 23 β’ 7
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper β’ 2502.01341 β’ Published Feb 3 β’ 39