Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding"
Wenkai Yang
Keven16
AI & ML interests
None yet
Recent Activity
authored a paper about 11 hours ago
AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents updated a dataset about 12 hours ago
Keven16/OPSD-Example-Data published a dataset about 12 hours ago
Keven16/OPSD-Example-DataOrganizations
None yet