view article Article OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments +3 Feb 12 • 31
PaperBanana: Automating Academic Illustration for AI Scientists Paper • 2601.23265 • Published Jan 30 • 220
AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders Paper • 2602.05027 • Published Feb 4 • 62
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published Jan 17 • 34