Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published Jan 16, 2025 • 41
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published Jan 28, 2025 • 36
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search Paper • 2502.02508 • Published Feb 4, 2025 • 22
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning Paper • 2502.03275 • Published Feb 5, 2025 • 18
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization Paper • 2502.04306 • Published Feb 6, 2025 • 20
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation Paper • 2502.03860 • Published Feb 6, 2025 • 25