WildEval

non-profit

wild_eval

WildEval

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

faezeb authored a paper 16 days ago

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

ChengsongHuang authored a paper 21 days ago

VisPlay: Self-Evolving Vision-Language Models from Images

yuntian-deng authored a paper about 2 months ago

TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar

View all activity

faezeb

authored a paper 16 days ago

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published 16 days ago • 55

ChengsongHuang

authored a paper 21 days ago

VisPlay: Self-Evolving Vision-Language Models from Images

Paper • 2511.15661 • Published 21 days ago • 42

yuntian-deng

authored a paper about 2 months ago

TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar

Paper • 2510.14972 • Published Oct 16 • 33

DongfuJiang

authored 2 papers 2 months ago

Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning

Paper • 2509.22824 • Published Sep 26 • 20

VideoScore2: Think before You Score in Generative Video Evaluation

Paper • 2509.22799 • Published Sep 26 • 25

yuntian-deng

authored 2 papers 2 months ago

Interactive Training: Feedback-Driven Neural Network Optimization

Paper • 2510.02297 • Published Oct 2 • 42

Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls

Paper • 2510.00184 • Published Sep 30 • 16

ChengsongHuang

authored a paper 3 months ago

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9 • 101

DongfuJiang

authored a paper 3 months ago

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published Sep 1 • 75

ChengsongHuang

authored a paper 3 months ago

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Paper • 2508.19652 • Published Aug 27 • 84

ChengsongHuang

authored a paper 4 months ago

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published Aug 7 • 130

valpy

authored 4 papers 5 months ago

2 OLMo 2 Furious

Paper • 2501.00656 • Published Dec 31, 2024 • 22

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

Paper • 2502.08395 • Published Feb 12

RewardBench 2: Advancing Reward Model Evaluation

Paper • 2506.01937 • Published Jun 2 • 7

Generalizing Verifiable Instruction Following

Paper • 2507.02833 • Published Jul 3 • 1

yuntian-deng

authored a paper 5 months ago

NeuralOS: Towards Simulating Operating Systems via Neural Generative Models

Paper • 2507.08800 • Published Jul 11 • 80

ChengsongHuang

authored a paper 6 months ago

POSS: Position Specialist Generates Better Draft for Speculative Decoding

Paper • 2506.03566 • Published Jun 4 • 6

DongfuJiang

authored 2 papers 7 months ago

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Paper • 2505.20139 • Published May 26 • 19

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

Paper • 2505.16175 • Published May 22 • 41

yuntian-deng

authored a paper 7 months ago

Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

Paper • 2505.15612 • Published May 21 • 34

AI & ML interests

Recent Activity

Team members 9

WildEval's activity