arxiv:2604.14914

Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes

Published on Apr 16

· Submitted by

Authors:

Abstract

State-of-the-art text-to-3D generative models suffer from latent sink traps where they lose sensitivity to text prompts, but a robust framework can overcome this by decoupling geometric representation from linguistic sensitivity.

AI-generated summary

Text-driven inversion of generative models is a core paradigm for manipulating 2D or 3D content, unlocking numerous applications such as text-based editing, style transfer, or inverse problems. However, it relies on the assumption that generative models remain sensitive to natural language prompts. We demonstrate that for state-of-the-art native text-to-3D generative models, this assumption often collapses. We identify a critical failure mode where generation trajectories are drawn into latent ``sink traps'': regions where the model becomes insensitive to prompt modifications. In these regimes, changes to the input text fail to alter internal representations in a way that alters the output geometry. Crucially, we observe that this is not a limitation of the model's geometric expressivity; the same generative models possess the ability to produce a vast diversity of shapes but, as we demonstrate, become insensitive to out-of-distribution text guidance. We investigate this behavior by analyzing the sampling trajectories of the generative model, and find that complex geometries can still be represented and produced by leveraging the model's unconditional generative prior. This leads to a more robust framework for text-based 3D shape editing that bypasses latent sinks by decoupling a model's geometric representation power from its linguistic sensitivity. Our approach addresses the limitations of current 3D pipelines and enables high-fidelity semantic manipulation of out-of-distribution 3D shapes. Project webpage: https://daidedou.sorpi.fr/publication/beyondprompts

View arXiv page View PDF Project page Add to collection

Community

leopoldmaillard

Paper submitter about 4 hours ago

•

edited about 3 hours ago

The paper studies text-driven inversion of 3D generative models. It establishes the existence of sink traps: the model can become insensitive to prompts during generation, effectively collapsing to a single shape. Despite this property, the models retain strong geometric expressiveness in the unconditional distribution. The paper demonstrates this finding by proposing a novel pose-retargeting editing pipeline using an unconditional 3D inversion on out-of-distribution shapes.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2604.14914

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.14914 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.14914 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.14914 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.