Submitted by
Darshan Deshpande
AI & ML interests
LLM Evaluation
Recent Activity
View all activity
Papers
Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis
MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments