HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent Paper • 2402.01018 • Published Feb 1, 2024 • 2
Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense Paper • 2510.16259 • Published Oct 17 • 3
Tulu3 with distraction mitigation data Collection LLM and LRM can be easily distracted by hidden instructions or irrelevant tasks. We curated SFT and DPO data that model can finetune to avoid distract • 5 items • Updated Oct 30 • 2
The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs Paper • 2510.09905 • Published Oct 10 • 6
FiSCo: Evaluating LLM's Group Level Fairness Collection Generated Questions for group fairness evaluation • 6 items • Updated Oct 6 • 2
LOOM-Scope: a comprehensive and efficient LOng-cOntext Model evaluation framework Paper • 2507.04723 • Published Jul 7 • 11
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs Paper • 2507.05687 • Published Jul 8 • 27
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning Paper • 2505.08054 • Published May 12 • 3
SATA-BENCH: Select All That Apply Benchmark for Multiple Choice Questions Paper • 2506.00643 • Published May 31 • 6