AI & ML interests
Evaluating AI Agents on Continuous Tasks
Recent Activity
View all activity
Organization Card
Evaluate AI on Continuous Tasks
EvoClaw is a general-purpose evaluation harness for AI agents on continuous tasks, where milestones build on each other, dependencies interleave, and context accumulates over a long session. Unlike one-shot benchmarks, EvoClaw challenges agents to complete ordered sequences of tasks within a persistent environment, enabling fine-grained, per-milestone analysis.
models 0
None public yet
datasets 0
None public yet