Qwen3 GRPO-trained w/ thinksafe
-
Sangsang/thinksafe-0.6B-n1-ablation_R32_BZ32_Gen8_checkpoint-500
Text Generation • Updated -
Sangsang/thinksafe-0.6B-n1-ablation_R32_BZ32_Gen8_checkpoint-1000
Text Generation • Updated -
Sangsang/thinksafe-0.6B-n1-ablation_R32_BZ32_Gen8_checkpoint-1500
Text Generation • Updated -
Sangsang/thinksafe-0.6B-n1-ablation_R32_BZ32_Gen8_checkpoint-2000
Text Generation • Updated