Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published 9 days ago • 16
ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models Paper • 2509.21991 • Published Sep 26, 2025 • 5
ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models Paper • 2509.21991 • Published Sep 26, 2025 • 5
Seeing Voices: Generating A-Roll Video from Audio with Mirage Paper • 2506.08279 • Published Jun 9, 2025 • 27
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper • 2504.00557 • Published Apr 1, 2025 • 15