-
Is Noise Conditioning Necessary for Denoising Generative Models?
Paper • 2502.13129 • Published • 1 -
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
Paper • 2504.10483 • Published • 21 -
Mean Flows for One-step Generative Modeling
Paper • 2505.13447 • Published • 7 -
Latent Diffusion Model without Variational Autoencoder
Paper • 2510.15301 • Published • 48
Collections
Discover the best community collections!
Collections including paper arxiv:2504.10483
-
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Paper • 2504.16064 • Published • 14 -
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models
Paper • 2504.14032 • Published • 7 -
Towards Understanding Camera Motions in Any Video
Paper • 2504.15376 • Published • 157 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 120
-
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88 -
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
Paper • 2503.07703 • Published • 37 -
Gemini Embedding: Generalizable Embeddings from Gemini
Paper • 2503.07891 • Published • 45 -
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper • 2503.07572 • Published • 47
-
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 30 -
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
Paper • 2411.05005 • Published • 13 -
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Paper • 2411.04075 • Published • 17 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 107 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 151 -
Autoregressive Diffusion Models
Paper • 2110.02037 • Published -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 8
-
An Empirical Study of GPT-4o Image Generation Capabilities
Paper • 2504.05979 • Published • 64 -
Antidistillation Sampling
Paper • 2504.13146 • Published • 59 -
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling
Paper • 2504.13169 • Published • 39 -
WORLDMEM: Long-term Consistent World Simulation with Memory
Paper • 2504.12369 • Published • 35
-
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Paper • 2412.09619 • Published • 28 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 48 -
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 28
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77
-
Is Noise Conditioning Necessary for Denoising Generative Models?
Paper • 2502.13129 • Published • 1 -
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
Paper • 2504.10483 • Published • 21 -
Mean Flows for One-step Generative Modeling
Paper • 2505.13447 • Published • 7 -
Latent Diffusion Model without Variational Autoencoder
Paper • 2510.15301 • Published • 48
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 107 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 151 -
Autoregressive Diffusion Models
Paper • 2110.02037 • Published -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 8
-
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Paper • 2504.16064 • Published • 14 -
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models
Paper • 2504.14032 • Published • 7 -
Towards Understanding Camera Motions in Any Video
Paper • 2504.15376 • Published • 157 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 120
-
An Empirical Study of GPT-4o Image Generation Capabilities
Paper • 2504.05979 • Published • 64 -
Antidistillation Sampling
Paper • 2504.13146 • Published • 59 -
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling
Paper • 2504.13169 • Published • 39 -
WORLDMEM: Long-term Consistent World Simulation with Memory
Paper • 2504.12369 • Published • 35
-
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88 -
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
Paper • 2503.07703 • Published • 37 -
Gemini Embedding: Generalizable Embeddings from Gemini
Paper • 2503.07891 • Published • 45 -
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper • 2503.07572 • Published • 47
-
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Paper • 2412.09619 • Published • 28 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 48 -
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 28
-
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 30 -
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
Paper • 2411.05005 • Published • 13 -
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Paper • 2411.04075 • Published • 17 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77