AI & ML interests
None defined yet.
Recent Activity
MultiRL/qwen3_4b_easy_rl_new
4B • Updated MultiRL/qwen3_1.7b_easy_rl_gspo
2B • Updated • 1
4B • Updated • 1
MultiRL/qwen3_1.7b_easy_rl_final_step120
2B • Updated MultiRL/qwen3_4b_medium_rl_final
4B • Updated MultiRL/qwen3_4b_sft_one_act
4B • Updated • 1
MultiRL/qwen3_1.7b_easy_rl_reinforce_ori
2B • Updated • 4
MultiRL/qwen3_1.7b_easy_rl_reinforce_alpha_0.5
2B • Updated • 1
MultiRL/qwen3_1.7b_easy_rl_reinforce_alpha_1
2B • Updated • 2
MultiRL/qwen3_1.7b_easy_rl_reinforce_alpha_0
2B • Updated • 2
MultiRL/qwen3_1.7b_sft_one_act
2B • Updated • 1
MultiRL/qwen3_1.7b_easy_rl_final
2B • Updated • 1
MultiRL/qwen3_4b_easy_rl_final
4B • Updated MultiRL/qwen3_1.7b_sft_final
2B • Updated • 2
MultiRL/qwen3_4b_sft_final
4B • Updated • 1
MultiRL/qwen3_1.7b_easy_rl_new
MultiRL/qwen3_4b_standard_medium_rl
MultiRL/qwen3_4b_standard_easy_rl
4B • Updated • 1
MultiRL/qwen3_4b_medium_rl_progress_C
MultiRL/qwen3_4b_medium_rl
MultiRL/qwen3_1.7b_easy_rl_test_task_group
2B • Updated • 7
MultiRL/qwen3_1.7b_easy_rl_test
2B • Updated • 6
MultiRL/qwen3_1.7b_sudoku_sft
2B • Updated • 1
MultiRL/qwen3_1.7b_easy_reinforce_batch_32_by_pass
2B • Updated MultiRL/qwen3_1.7b_easy_reinforce_batch_64_by_pass
2B • Updated MultiRL/qwen3_1.7b_easy_reinforce_test
2B • Updated MultiRL/qwen3_1.7b_C_easy_gspo_test
2B • Updated MultiRL/qwen3_1.7b_base_C_normal_short_sft_lr_1e_5_C_easy_grpo_step70
2B • Updated MultiRL/qwen3_1.7b_C_short_sft_lr_1e_5_C_easy_reinforce_step80
2B • Updated MultiRL/qwen3_1.7b_base_C_normal_concise_sft_lr_5e_6
2B • Updated