Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability Paper • 2602.02477 • Published 3 days ago • 7
Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training Paper • 2602.01511 • Published 3 days ago • 12
RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents Paper • 2602.02486 • Published 3 days ago • 14
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published 4 days ago • 38
SimpleGPT: Improving GPT via A Simple Normalization Strategy Paper • 2602.01212 • Published 4 days ago • 2
Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation Paper • 2602.03806 • Published 1 day ago • 4
LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents Paper • 2602.01053 • Published 4 days ago • 6
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training Paper • 2602.00747 • Published 5 days ago • 8
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection Paper • 2602.03216 • Published 2 days ago • 11
Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification Paper • 2601.21244 • Published 7 days ago • 12
No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs Paper • 2602.02103 • Published 3 days ago • 61
SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training Paper • 2602.03411 • Published 2 days ago • 33
SWE-World: Building Software Engineering Agents in Docker-Free Environments Paper • 2602.03419 • Published 2 days ago • 36
MARS: Modular Agent with Reflective Search for Automated AI Research Paper • 2602.02660 • Published 3 days ago • 52
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding Paper • 2602.01785 • Published 3 days ago • 84
Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry Paper • 2601.22588 • Published 6 days ago • 4
Adaptive Ability Decomposing for Unlocking Large Reasoning Model Effective Reinforcement Learning Paper • 2602.00759 • Published 5 days ago • 5
PromptRL: Prompt Matters in RL for Flow-Based Image Generation Paper • 2602.01382 • Published 4 days ago • 8