ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper • 2511.21689 • Published 14 days ago • 100
Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published 9 days ago • 48
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30 • 115
HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents in Hierarchical Rule Application Paper • 2510.19631 • Published Oct 22 • 27
DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking Paper • 2510.20168 • Published Oct 23 • 27
Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents Paper • 2510.14438 • Published Oct 16 • 13
Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents Paper • 2510.14438 • Published Oct 16 • 13
Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation Dataset Paper • 2109.07679 • Published Sep 16, 2021
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph Paper • 2311.09174 • Published Nov 15, 2023
AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation Paper • 2402.10646 • Published Feb 16, 2024
CKBP v2: Better Annotation and Reasoning for Commonsense Knowledge Base Population Paper • 2304.10392 • Published Apr 20, 2023
UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression Paper • 2509.15763 • Published Sep 19
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents Paper • 2510.07172 • Published Oct 8 • 28
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents Paper • 2510.07172 • Published Oct 8 • 28 • 2
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents Paper • 2510.07172 • Published Oct 8 • 28
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering Paper • 2510.01591 • Published Oct 2 • 26
KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection Paper • 2310.09044 • Published Oct 13, 2023
StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical Understanding Paper • 2310.12874 • Published Oct 19, 2023
CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering Paper • 2305.14869 • Published May 24, 2023 • 1