Semantic Soft Bootstrapping Collection A self-distillation based training method for long context reasoning in a single LLM without reinforcement learning • 3 items • Updated 6 days ago
Semantic Soft Bootstrapping Collection A self-distillation based training method for long context reasoning in a single LLM without reinforcement learning • 3 items • Updated 6 days ago
Semantic Soft Bootstrapping Collection A self-distillation based training method for long context reasoning in a single LLM without reinforcement learning • 3 items • Updated 6 days ago
MOTIF paper Collection MOTIF trained model and Vanilla GRPO trained model, compared in the paper. • 3 items • Updated 7 days ago • 1