FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 339
Grad2Reward: From Sparse Judgment to Dense Rewards for Improving Open-Ended LLM Reasoning Paper • 2602.01791 • Published Feb 2 • 1
view article Article Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs +3 14 days ago • 29
view article Article How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs 16 days ago • 59
ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning Paper • 2603.05863 • Published Mar 6 • 6
InCoder-32B-Thinking: Industrial Code World Model for Thinking Paper • 2604.03144 • Published 20 days ago • 232
Token Warping Helps MLLMs Look from Nearby Viewpoints Paper • 2604.02870 • Published 20 days ago • 34