4 10 1

wangyuchi

YuchiWang

https://wangyuchi369.github.io/

AI & ML interests

Multimodal; Generative Models

Recent Activity

upvoted a paper 6 days ago

MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control

submitted a paper 6 days ago

MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control

authored a paper 20 days ago

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation

View all activity

Organizations

upvoted a paper 6 days ago

MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control

Paper • 2604.06156 • Published 7 days ago • 9

submitted a paper to Daily Papers 6 days ago

MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control

Paper • 2604.06156 • Published 7 days ago • 9

authored 2 papers 20 days ago

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation

Paper • 2503.07050 • Published Mar 10, 2025

SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model

Paper • 2510.12709 • Published Oct 14, 2025 • 13

New activity in TIGER-Lab/MMEB-V2 4 months ago

PNG file corruption

#8 opened 6 months ago by

Lingshaw

updated a collection 6 months ago

SAIL-Embedding

Collection

Omni-modal Embedding Foundation Model • 1 item • Updated Oct 17, 2025 • 2

upvoted a paper 6 months ago

SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model

Paper • 2510.12709 • Published Oct 14, 2025 • 13

authored a paper 7 months ago

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

Paper • 2505.22613 • Published May 28, 2025 • 9

upvoted a paper 11 months ago

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

Paper • 2505.22613 • Published May 28, 2025 • 9

commented a paper 11 months ago

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

Paper • 2505.22613 • Published May 28, 2025 • 9 •

upvoted a paper 11 months ago

MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

Paper • 2505.07916 • Published May 12, 2025 • 135

upvoted a paper about 1 year ago

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Paper • 2503.11647 • Published Mar 14, 2025 • 148

authored a paper over 1 year ago

VidTwin: Video VAE with Decoupled Structure and Dynamics

Paper • 2412.17726 • Published Dec 23, 2024 • 9

upvoted a paper over 1 year ago

VidTwin: Video VAE with Decoupled Structure and Dynamics

Paper • 2412.17726 • Published Dec 23, 2024 • 9

commented a paper over 1 year ago

VidTwin: Video VAE with Decoupled Structure and Dynamics

Paper • 2412.17726 • Published Dec 23, 2024 • 9 •

liked a model over 1 year ago

microsoft/VidTok

Updated Apr 5, 2025 • 42

authored 2 papers over 1 year ago

Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond

Paper • 2310.02071 • Published Oct 3, 2023 • 4

GAIA: Zero-shot Talking Avatar Generation

Paper • 2311.15230 • Published Nov 26, 2023 • 3

wangyuchi

AI & ML interests

Recent Activity

Organizations

YuchiWang's activity

PNG file corruption