Drive-JEPA: Video JEPA Meets Multimodal Trajectory Distillation for End-to-End Driving Paper • 2601.22032 • Published 5 days ago • 3
DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding Paper • 2601.23161 • Published 4 days ago • 9
NativeTok: Native Visual Tokenization for Improved Image Generation Paper • 2601.22837 • Published 4 days ago • 8
DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning Paper • 2601.21716 • Published 5 days ago • 12
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment Paper • 2601.20218 • Published 6 days ago • 13
FourierSampler: Unlocking Non-Autoregressive Potential in Diffusion Language Models via Frequency-Guided Generation Paper • 2601.23182 • Published 4 days ago • 18
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas Paper • 2601.21558 • Published 5 days ago • 51
ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published 5 days ago • 40
Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives Paper • 2601.20833 • Published 6 days ago • 169
Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts Paper • 2601.22156 • Published 5 days ago • 10
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning Paper • 2601.22069 • Published 5 days ago • 7
LoL: Longer than Longer, Scaling Video Generation to Hour Paper • 2601.16914 • Published 11 days ago • 19
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published 5 days ago • 67
Towards Pixel-Level VLM Perception via Simple Points Prediction Paper • 2601.19228 • Published 7 days ago • 16