MultiRL

non-profit

AI & ML interests

None defined yet.

Recent Activity

KimSHine updated a model 7 minutes ago

MultiRL/qwen3_1.7b_sudoku_multi_action_group_norm

KimSHine published a model 12 minutes ago

MultiRL/qwen3_1.7b_sudoku_multi_action_group_norm

KimSHine updated a model 12 minutes ago

MultiRL/qwen3_1.7b_sudoku_multi_action_group_norm_epoch3

View all activity

MultiRL 's models 182

MultiRL/qwen3_1.7b_easy_rl_ours_adv_fixed_gamma_1_98_gem_ms_seq_is

2B • Updated Jan 10 • 1

MultiRL/qwen3_1.7b_easy_rl_ours_adv_fixed_gamma_1_98_mask_only

2B • Updated Jan 10 • 1

MultiRL/qwen3_1.7b_easy_rl_ours_adv_fixed_gamma_995_98_ori_norm

2B • Updated Jan 6 • 1

MultiRL/qwen3_1.7b_easy_rl_ours_adv_fixed_gamma_995_98

2B • Updated Jan 5

MultiRL/qwen3_1.7b_sft_final_easy_reinforce_ours_adv_fixed_gamma_0.9

2B • Updated Jan 2 • 3

MultiRL/qwen3_1.7b_easy_rl_old_adv_fixed

2B • Updated Jan 1 • 3

MultiRL/qwen3_1.7b_easy_rl_fixed_gamma_1

2B • Updated Dec 30, 2025 • 3

MultiRL/qwen3_1.7b_easy_rl_old_adv_final_fixed_sequence_max_token_norm_batch_128

2B • Updated Dec 28, 2025

MultiRL/qwen3_1.7b_medium_rl_ours_adv_final_fixed_sequence_gamma_1

2B • Updated Dec 28, 2025

MultiRL/qwen3_1.7b_medium_rl_ours_adv_fixed_sequence_from_epoch_3

2B • Updated Dec 27, 2025

MultiRL/qwen3_1.7b_easy_rl_ours_adv_final_fixed_sequence_max_token_norm

2B • Updated Dec 27, 2025

MultiRL/qwen3_1.7b_easy_rl_ours_adv_fixed_sequence_batch_128

2B • Updated Dec 26, 2025 • 1

MultiRL/qwen3_1.7b_easy_rl_ours_adv_fixed_sequence_epoch_3

2B • Updated Dec 26, 2025 • 1

MultiRL/qwen3_1.7b_easy_rl_ours_adv_fixed_token

2B • Updated Dec 26, 2025

MultiRL/qwen3_1.7b_easy_rl_gamma_1_step_40

2B • Updated Dec 24, 2025

MultiRL/qwen3_4b_easy_rl_our_adv_final

4B • Updated Dec 22, 2025 • 1

MultiRL/qwen3_1.7b_easy_rl_final_group_norm

2B • Updated Dec 22, 2025 • 2

MultiRL/qwen3_1.7b_easy_rl_final_gamma_1

2B • Updated Dec 18, 2025 • 1

MultiRL/qwen3_4b_base_easy_rl_final

4B • Updated Dec 18, 2025 • 1

MultiRL/qwen3_4b_base_sft_final

4B • Updated Dec 17, 2025 • 1

MultiRL/qwen3_4b_easy_rl_new

4B • Updated Dec 16, 2025

MultiRL/qwen3_1.7b_easy_rl_gspo

2B • Updated Dec 16, 2025 • 1

MultiRL/qwen3_4b_sft_new

4B • Updated Dec 15, 2025 • 1

MultiRL/qwen3_1.7b_easy_rl_final_step120

2B • Updated Dec 15, 2025

MultiRL/qwen3_4b_medium_rl_final

4B • Updated Dec 15, 2025

MultiRL/qwen3_4b_sft_one_act

4B • Updated Dec 14, 2025 • 1

MultiRL/qwen3_1.7b_easy_rl_reinforce_ori

2B • Updated Dec 14, 2025 • 3

MultiRL/qwen3_1.7b_easy_rl_reinforce_alpha_0.5

2B • Updated Dec 14, 2025 • 1

MultiRL/qwen3_1.7b_easy_rl_reinforce_alpha_1

2B • Updated Dec 14, 2025 • 2

MultiRL/qwen3_1.7b_easy_rl_reinforce_alpha_0

2B • Updated Dec 14, 2025 • 2