VLM - a NothingLQH Collection

NothingLQH 's Collections

MJ6

VLM

ORC

Code

Speech

Prompt

Story

NLP

Anime

3D

Video

DatasetLanguage

Vistral-7B-Chat

Image

LLM

VLM

updated Jun 16, 2025

FocusedAD: Character-centric Movie Audio Description

Paper • 2504.12157 • Published Apr 16, 2025 • 8
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

Paper • 2504.10465 • Published Apr 14, 2025 • 27
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Paper • 2504.13180 • Published Apr 17, 2025 • 19
OS-Copilot/OS-Atlas-Base-7B

Image-Text-to-Text • 8B • Updated Nov 19, 2024 • 399 • 42
google/siglip-so400m-patch14-384

Zero-Shot Image Classification • 0.9B • Updated Sep 26, 2024 • 4.08M • 635
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

Paper • 2505.14231 • Published May 20, 2025 • 52
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization

Paper • 2504.12083 • Published Apr 16, 2025 • 3
Running on Zero

Featured

88

D-Fine - SOTA Real-Time Object Detector

⚡

88

Object Detection on Images and Video
ByteDance-Seed/BAGEL-7B-MoT

Any-to-Any • 15B • Updated Dec 8, 2025 • 609 • 1.17k
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Paper • 2506.03143 • Published Jun 3, 2025 • 53
Running

44

Leaderboard: Physical Reasoning from Video

🏃

44

Submit model evaluations and view leaderboard results
Running on Zero

MCP

29

Gaze LLE

👀

29

Gaze Target Estimation
Hcompany/Holo1-7B

Image-Text-to-Text • 8B • Updated Jun 10, 2025 • 168 • 224
OpenGVLab/InternVL3-9B

Image-Text-to-Text • 9B • Updated May 29, 2025 • 2.93k • 25