Boqiang Zhang's picture

Boqiang Zhang

Cyril666

·

https://cyrilsterling.github.io/

CyrilSterling

AI & ML interests

Multi-modal Large Language Models Vision-Language-Action Models

Recent Activity

authored a paper 2 days ago

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

authored a paper 2 days ago

What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness

authored a paper 2 days ago

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

View all activity

Organizations

authored 7 papers 2 days ago

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published Jan 22, 2025 • 90

What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness

Paper • 2502.14914 • Published Feb 19, 2025

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Paper • 2509.21268 • Published Sep 25, 2025 • 104

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Paper • 2512.16561 • Published Dec 18, 2025 • 20

Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents

Paper • 2410.13185 • Published Oct 17, 2024 • 5

Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition

Paper • 2407.05562 • Published Jul 8, 2024

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 46