M Saad Salman's picture

4 324

M Saad Salman

MSS444

·

MSS444

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability

upvoted a paper 1 day ago

LoopViT: Scaling Visual ARC with Looped Transformers

upvoted a paper 1 day ago

Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training

View all activity

Organizations

None yet

upvoted 17 papers 1 day ago

Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability

Paper • 2602.02477 • Published 3 days ago • 7

LoopViT: Scaling Visual ARC with Looped Transformers

Paper • 2602.02156 • Published 3 days ago • 10

Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training

Paper • 2602.01511 • Published 3 days ago • 12

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

Paper • 2602.02486 • Published 3 days ago • 14

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Paper • 2602.01058 • Published 4 days ago • 38

Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published 3 days ago • 190

SimpleGPT: Improving GPT via A Simple Normalization Strategy

Paper • 2602.01212 • Published 4 days ago • 2

Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation

Paper • 2602.03806 • Published 1 day ago • 4

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

Paper • 2602.01053 • Published 4 days ago • 6

Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training

Paper • 2602.00747 • Published 5 days ago • 8

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Paper • 2602.03216 • Published 2 days ago • 11

Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification

Paper • 2601.21244 • Published 7 days ago • 12

No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs

Paper • 2602.02103 • Published 3 days ago • 61

SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training

Paper • 2602.03411 • Published 2 days ago • 33

SWE-World: Building Software Engineering Agents in Docker-Free Environments

Paper • 2602.03419 • Published 2 days ago • 36

MARS: Modular Agent with Reflective Search for Automated AI Research

Paper • 2602.02660 • Published 3 days ago • 52

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

Paper • 2602.01785 • Published 3 days ago • 84

upvoted 3 papers 2 days ago

Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry

Paper • 2601.22588 • Published 6 days ago • 4

Adaptive Ability Decomposing for Unlocking Large Reasoning Model Effective Reinforcement Learning

Paper • 2602.00759 • Published 5 days ago • 5

PromptRL: Prompt Matters in RL for Flow-Based Image Generation

Paper • 2602.01382 • Published 4 days ago • 8