Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation Paper • 2510.22115 • Published Oct 25 • 83
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published Oct 22 • 114
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training Paper • 2507.17634 • Published Jul 23 • 2