Qwen3-Base post-trained checkpoints for our paper, Maximum Likelihood Reinforcement Learning [https://zanette-labs.github.io/MaxRL/]
-
ftajwar/qwen3_4B_Base_MaxRL_Polaris_1000_steps
Text Generation • 4B • Updated -
ftajwar/qwen3_4B_Base_GRPO_Polaris_1000_steps
Text Generation • 4B • Updated -
ftajwar/qwen3_1.7B_Base_MaxRL_Polaris_1000_steps
Text Generation • 2B • Updated -
ftajwar/qwen3_1.7B_Base_GRPO_Polaris_1000_steps
Text Generation • 2B • Updated