Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF Paper • 2410.04612 • Published Oct 6, 2024
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models Paper • 2405.20541 • Published May 30, 2024 • 24
REBEL: Reinforcement Learning via Regressing Relative Rewards Paper • 2404.16767 • Published Apr 25, 2024 • 2
Provable Reward-Agnostic Preference-Based Reinforcement Learning Paper • 2305.18505 • Published May 29, 2023
Towards Characterizing Domain Counterfactuals For Invertible Latent Causal Models Paper • 2306.11281 • Published Jun 20, 2023
StarCraftImage: A Dataset For Prototyping Spatial Reasoning Methods For Multi-Agent Environments Paper • 2401.04290 • Published Jan 9, 2024 • 3
Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests Paper • 2107.06929 • Published Jul 14, 2021
Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second Paper • 2306.07552 • Published Jun 13, 2023 • 3