FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model Paper • 2510.10921 • Published Oct 13 • 10
Trace Anything: Representing Any Video in 4D via Trajectory Fields Paper • 2510.13802 • Published Oct 15 • 30
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26 • 70
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21 • 409