Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling Paper • 2512.12675 • Published 22 days ago • 40
SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder Paper • 2512.11749 • Published 24 days ago • 38
Latent Diffusion Model without Variational Autoencoder Paper • 2510.15301 • Published Oct 17, 2025 • 49
NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining Paper • 2507.14119 • Published Jul 18, 2025 • 58
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Paper • 2507.16746 • Published Jul 22, 2025 • 35
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting Paper • 2506.09952 • Published Jun 11, 2025 • 6
Flow-GRPO: Training Flow Matching Models via Online RL Paper • 2505.05470 • Published May 8, 2025 • 86
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31, 2025 • 76
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published Mar 26, 2025 • 56
Position: Interactive Generative Video as Next-Generation Game Engine Paper • 2503.17359 • Published Mar 21, 2025 • 61
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers Paper • 2503.14487 • Published Mar 18, 2025 • 28
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Paper • 2503.11647 • Published Mar 14, 2025 • 146
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published Mar 10, 2025 • 61
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published Feb 13, 2025 • 43
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22, 2025 • 433