Exp3-FactR: FactR-Only GRPO ablation. Exponential(a=1), behavior=0, facts=1. 240 steps, Qwen2.5-7B.
YOULING HUANG
Ricardo-H
·
AI & ML interests
None yet
Recent Activity
published
a model about 23 hours ago
Ricardo-H/ws-wm-0206-step-30 updated
a collection
2 days ago
ws-wm-0224 updated
a collection
2 days ago
ws-wm-0224 Organizations
None yet