Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper • 2506.13759 • Published Jun 16, 2025 • 43
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning Paper • 2506.06205 • Published Jun 6, 2025 • 30
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs Paper • 2506.01674 • Published Jun 2, 2025 • 28
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper • 2306.13649 • Published Jun 23, 2023 • 28
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning Paper • 2505.20355 • Published May 26, 2025 • 36
view article Article Finally, a Replacement for BERT: Introducing ModernBERT +13 Dec 19, 2024 • 715
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31, 2025 • 301
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published Feb 19, 2025 • 69
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16, 2025 • 166
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13, 2025 • 148