Qian Liu's picture

Qian Liu

SivilTaram

·

http://siviltaram.github.io/

AI & ML interests

Cooking cool things

Recent Activity

liked a model 7 days ago

deepseek-ai/DeepSeek-V3.2

upvoted a paper 7 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

upvoted a paper 7 days ago

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

View all activity

Organizations

upvoted 2 papers 7 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published 8 days ago • 83

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published 16 days ago • 246

upvoted 2 papers about 1 month ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5 • 124

Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published Oct 29 • 219

upvoted a paper about 2 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9 • 36

upvoted a paper 2 months ago

Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Paper • 2509.25849 • Published Sep 30 • 47

upvoted 6 papers 3 months ago

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search

Paper • 2509.07969 • Published Sep 9 • 59

Reverse-Engineered Reasoning for Open-Ended Generation

Paper • 2509.06160 • Published Sep 7 • 149

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Paper • 2509.02544 • Published Sep 2 • 124

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published Sep 1 • 75

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2 • 83

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24 • 80

upvoted 6 papers 4 months ago

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6 • 129

FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

Paper • 2508.11987 • Published Aug 16 • 71

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5 • 121

Efficient Agents: Building Effective Agents While Reducing Cost

Paper • 2508.02694 • Published Jul 24 • 85

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published Aug 7 • 130

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Paper • 2508.02193 • Published Aug 4 • 132

upvoted 2 papers 5 months ago

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 315

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Paper • 2507.12415 • Published Jul 16 • 42