UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks Paper • 2507.11336 • Published Jul 15, 2025 • 6
Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning Paper • 2512.19687 • Published 10 days ago • 1
ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement Paper • 2512.13303 • Published 17 days ago • 16
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance Paper • 2510.24711 • Published Oct 28, 2025 • 19
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer Paper • 2509.24695 • Published Sep 29, 2025 • 44
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13, 2025 • 165
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published Nov 15, 2024 • 87
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers Paper • 2503.11579 • Published Mar 14, 2025 • 21