Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 9 days ago • 83
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published 15 days ago • 54
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments Paper • 2511.07317 • Published 29 days ago • 13
view article Article OpenReasoning-Nemotron: A Family of State-of-the-Art Distilled Reasoning Models Jul 18 • 50
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees Paper • 2503.08893 • Published Mar 11 • 6
RecA Collection Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning! • 8 items • Updated Sep 22 • 13
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2 • 83
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1 • 75
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning Paper • 2507.19457 • Published Jul 25 • 28
view article Article Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models Jul 10 • 52
Spurious Rewards Collection Spurious Rewards: Rethinking Training Signals in RLVR • 14 items • Updated Jun 13 • 2
MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation Paper • 2505.17613 • Published May 23 • 8
Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models Paper • 2505.17015 • Published May 22 • 9
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models Paper • 2505.14810 • Published May 20 • 62
One-Shot RLVR Collection Collections of models and papers for works: "Reinforcement Learning for Reasoning in Large Language Models with One Training Example" • 24 items • Updated Nov 3 • 1