Improve model card for VisionReasoner-7B (Seg-Zero framework)

by nielsr HF Staff - opened Jul 1, 2025

←

nielsr

Jul 1, 2025

This PR significantly improves the model card for Ricky06662/VisionReasoner-7B by:

Updating the pipeline_tag to image-segmentation to accurately reflect the model's core task of generating pixel-level masks, enhancing its discoverability on the Hugging Face Hub.
Clarifying paper references by highlighting its foundation in the primary paper, "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement", while also linking to "VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning".
Correcting code and project page links to the official Seg-Zero GitHub repository: https://github.com/dvlab-research/Seg-Zero.
Expanding the model description with detailed insights from the paper abstract and GitHub README, covering its decoupled architecture, reinforcement learning training, zero-shot generalization capabilities, and performance highlights.
Adding additional datasets (Ricky06662/refCOCOg_9k_840, Ricky06662/VisionReasoner_multi_object_7k_840) to the metadata for better completeness.
Embedding key visuals (overview, pipeline, examples) directly from the GitHub README to provide a richer visual context.
Including a comprehensive citation section with BibTeX entries for both associated papers.

These changes ensure the model card is more informative, accurate, and discoverable for researchers and users.

Ricky06662 changed pull request status to merged Jul 2, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment