Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published 4 days ago • 59
Reinforcement Learning for Self-Improving Agent with Skill Library Paper • 2512.17102 • Published 8 days ago • 20
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding Paper • 2512.13586 • Published 11 days ago • 87
Ministral 3 Collection A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated 24 days ago • 133
T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground Paper • 2512.10430 • Published 15 days ago • 112
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO Paper • 2511.13288 • Published Nov 17 • 17
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs Paper • 2506.14245 • Published Jun 17 • 45
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism Paper • 2511.11373 • Published Nov 14 • 12
Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step Paper • 2509.23924 • Published Sep 28 • 8