Zhou

FireFlyCourageous

Lattic-zjj

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks

upvoted a paper 9 days ago

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

upvoted an article 13 days ago

SigLIP 2: A better multilingual vision language encoder

View all activity

Organizations

upvoted a paper 1 day ago

UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks

Paper • 2507.11336 • Published Jul 15, 2025 • 6

upvoted a paper 9 days ago

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

Paper • 2512.19687 • Published 10 days ago • 1

upvoted an article 13 days ago

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21, 2025

•

193

upvoted a paper 16 days ago

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Paper • 2512.13303 • Published 17 days ago • 16

liked a dataset about 1 month ago

nyu-visionx/VSI-590K

Preview • Updated Nov 7, 2025 • 3.22k • 9

upvoted a collection 2 months ago

Emu3.5

Collection

Native Multimodal Models are World Learners 🌍 • 4 items • Updated 8 days ago • 72

upvoted 2 papers 2 months ago

Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance

Paper • 2510.24711 • Published Oct 28, 2025 • 19

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Paper • 2509.24695 • Published Sep 29, 2025 • 44

upvoted a paper 3 months ago

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13, 2025 • 165

liked a Space 6 months ago

Tar

🚀

Unified MLLM with Text-Aligned Representations

liked a Space 7 months ago

BAGEL

🚀

215

Demo for BAGEL

liked a dataset 8 months ago

BLIP3o/BLIP3o-Pretrain-Long-Caption

Viewer • Updated Jun 26, 2025 • 27.2M • 21.1k • 56

liked a model 8 months ago

deepseek-ai/Janus-Pro-7B

Any-to-Any • Updated Feb 1, 2025 • 67.2k • 3.55k

liked a dataset 8 months ago

BLIP3o/BLIP3o-60k

Viewer • Updated May 25, 2025 • 7.1k • 1.35k • 33

liked a Space 8 months ago

Video Generation Leaderboard

📊

182

Text to Video and Image to Video Arena & Leaderboard

updated a model 8 months ago

FireFlyCourageous/MMCTR_DIN_MicroLens_1M_x1

Updated Apr 24, 2025

published a model 8 months ago

FireFlyCourageous/MMCTR_DIN_MicroLens_1M_x1

Updated Apr 24, 2025

liked a dataset 10 months ago

We-Math/We-Math

Viewer • Updated Aug 13, 2025 • 1.74k • 656 • 34

upvoted 2 papers 10 months ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 87

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Paper • 2503.11579 • Published Mar 14, 2025 • 21

Zhou

AI & ML interests

Recent Activity

Organizations

FireFlyCourageous's activity

SigLIP 2: A better multilingual vision language encoder

Tar

BAGEL

Video Generation Leaderboard