AI & ML interests
Building breatkthrough AI to solve the world's biggest problems.
Recent Activity
View all activity
Papers
TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics
How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs
Organization Card
spaces 13
pinned
Running
20
AstaBench Leaderboard
🥇
View benchmark leaderboards
pinned
Running
422
Reward Bench Leaderboard
📐
Explore RewardBench model rankings and scores
pinned
Running
2
HREF Leaderboard
📐
Browse and search HREF leaderboard data
pinned
Running
91
Zebra Logic Bench
🦓
Show leaderboard and explore model puzzle results
pinned
Running
3
SUPER Leaderboard
🤖
Display a static leaderboard from a JSON file
pinned
Running
53
ZeroEval Leaderboard
📊
Embed ZeroEval for evaluation
models 858
allenai/ACE2-ERA5
Updated
• 66 • 15
allenai/Olmo-Hybrid-7B
Text Generation • Updated
• 16.4k • 44
allenai/Olmo-Hybrid-Think-SFT-7B
Text Generation • Updated
• 716 • 11
allenai/Olmo-Hybrid-Instruct-DPO-7B
Text Generation • 7B • Updated
• 2.71k • 15
allenai/Olmo-Hybrid-Instruct-SFT-7B
Text Generation • Updated
• 1.51k • 13
allenai/FlexOlmo-7x7B-1T-RT
Text Generation • 33B • Updated
• 127 • 7
allenai/FlexOlmo-7x7B-1T
Text Generation • 33B • Updated
• 276 • 39
allenai/Flex-public-7B-1T
Text Generation • 7B • Updated
• 298 • 5
allenai/Flex-reddit-2x7B-1T
Text Generation • 12B • Updated
• 4.77k • 7
allenai/Flex-pes2o-2x7B-1T
Text Generation • 12B • Updated
• 196 • 2
datasets 420
allenai/asta-summary-citation-counts
Viewer
• Updated
• 49.2M • 466 • 8
allenai/Sera-4.5A-Full-T1
Viewer
• Updated
• 48.3k • 102 • 1
allenai/Sera-4.5A-Lite-T1
Viewer
• Updated
• 24.5k • 90 • 3
allenai/Sera-4.6-Lite-T1
Viewer
• Updated
• 24.6k • 72 • 1
allenai/Sera-4.5A-Full-T2
Viewer
• Updated
• 33.9k • 78 • 1
allenai/Sera-4.5A-Lite-T2
Viewer
• Updated
• 23.9k • 132 • 3
allenai/Sera-4.6-Lite-T2
Viewer
• Updated
• 25.2k • 332 • 9
allenai/Sera-4.6-Lite-47000
Viewer
• Updated
• 31k • 184 • 1
allenai/Molmo2-VideoPoint
Viewer
• Updated
• 1.32M • 395 • 5
allenai/Dolci-Think-SFT-Olmo-Hybrid-Tool-Use-SA
Viewer
• Updated
• 1.6k • 76 • 6