CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing Paper • 2602.15823 • Published 4 days ago • 2
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models Paper • 2510.09541 • Published Oct 10, 2025 • 17
Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes Paper • 2601.18795 • Published 26 days ago • 1
CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing Paper • 2602.15823 • Published 4 days ago • 2
Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking Paper • 2412.09544 • Published Dec 12, 2024