Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Paper • 2504.10465 • Published Apr 14, 2025 • 27
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding Paper • 2504.13180 • Published Apr 17, 2025 • 19
google/siglip-so400m-patch14-384 Zero-Shot Image Classification • 0.9B • Updated Sep 26, 2024 • 4.08M • 635
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published May 20, 2025 • 52
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization Paper • 2504.12083 • Published Apr 16, 2025 • 3
Running on Zero Featured 88 D-Fine - SOTA Real-Time Object Detector ⚡ 88 Object Detection on Images and Video
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents Paper • 2506.03143 • Published Jun 3, 2025 • 53
Running 44 Leaderboard: Physical Reasoning from Video 🏃 44 Submit model evaluations and view leaderboard results