MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control Paper • 2604.06156 • Published 7 days ago • 9
MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control Paper • 2604.06156 • Published 7 days ago • 9
TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation Paper • 2503.07050 • Published Mar 10, 2025
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model Paper • 2510.12709 • Published Oct 14, 2025 • 13
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model Paper • 2510.12709 • Published Oct 14, 2025 • 13
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction Paper • 2505.22613 • Published May 28, 2025 • 9
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction Paper • 2505.22613 • Published May 28, 2025 • 9
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction Paper • 2505.22613 • Published May 28, 2025 • 9 • 2
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder Paper • 2505.07916 • Published May 12, 2025 • 135
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Paper • 2503.11647 • Published Mar 14, 2025 • 148
VidTwin: Video VAE with Decoupled Structure and Dynamics Paper • 2412.17726 • Published Dec 23, 2024 • 9
VidTwin: Video VAE with Decoupled Structure and Dynamics Paper • 2412.17726 • Published Dec 23, 2024 • 9
VidTwin: Video VAE with Decoupled Structure and Dynamics Paper • 2412.17726 • Published Dec 23, 2024 • 9 • 3
Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond Paper • 2310.02071 • Published Oct 3, 2023 • 4