data-archetype/capacitor_decoder
Capacitor decoder: a faster, lighter FLUX.2-compatible latent decoder built on the SemDisDiffAE architecture.
Decode Speed
| Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
|---|---|---|---|---|---|---|
512x512 |
1.85x |
59.3% |
11.40 |
21.14 |
391.6 MiB |
961.9 MiB |
1024x1024 |
3.28x |
79.1% |
26.31 |
86.24 |
601.4 MiB |
2876.4 MiB |
2048x2048 |
4.70x |
86.4% |
86.29 |
405.84 |
1437.4 MiB |
10531.4 MiB |
These measurements are decode-only. Each image is first encoded once with the same FLUX.2 encoder, latents are cached in memory, and then both decoders are timed over the same cached latent set.
2k PSNR Benchmark
| Model | Mean PSNR (dB) | Std (dB) | Median (dB) | Min (dB) | P5 (dB) | P95 (dB) | Max (dB) |
|---|---|---|---|---|---|---|---|
| FLUX.2 VAE | 36.28 | 4.53 | 36.07 | 22.73 | 28.89 | 43.63 | 47.38 |
| capacitor_decoder | 36.34 | 4.50 | 36.29 | 23.28 | 29.06 | 43.66 | 47.43 |
| Delta vs FLUX.2 | Mean (dB) | Std (dB) | Median (dB) | Min (dB) | P5 (dB) | P95 (dB) | Max (dB) |
|---|---|---|---|---|---|---|---|
| capacitor_decoder - FLUX.2 | 0.055 | 0.531 | 0.062 | -1.968 | -0.811 | 0.886 | 2.807 |
Evaluated on 2000 validation images: roughly 2/3
photographs and 1/3 book covers. Each image is encoded once with FLUX.2 and
reused for both decoders.
Usage
import torch
from diffusers.models import AutoencoderKLFlux2
from capacitor_decoder import CapacitorDecoder, CapacitorDecoderInferenceConfig
def flux2_patchify_and_whiten(
latents: torch.Tensor,
vae: AutoencoderKLFlux2,
) -> torch.Tensor:
b, c, h, w = latents.shape
if h % 2 != 0 or w % 2 != 0:
raise ValueError(f"Expected even FLUX.2 latent grid, got H={h}, W={w}")
z = latents.reshape(b, c, h // 2, 2, w // 2, 2)
z = z.permute(0, 1, 3, 5, 2, 4).reshape(b, c * 4, h // 2, w // 2)
mean = vae.bn.running_mean.view(1, -1, 1, 1).to(device=z.device, dtype=torch.float32)
var = vae.bn.running_var.view(1, -1, 1, 1).to(device=z.device, dtype=torch.float32)
std = torch.sqrt(var + float(vae.config.batch_norm_eps))
return (z.to(torch.float32) - mean) / std
device = "cuda"
flux2 = AutoencoderKLFlux2.from_pretrained(
"BiliSakura/VAEs",
subfolder="FLUX2-VAE",
torch_dtype=torch.bfloat16,
).to(device)
decoder = CapacitorDecoder.from_pretrained(
"data-archetype/capacitor_decoder",
device=device,
dtype=torch.bfloat16,
)
image = ... # [1, 3, H, W] in [-1, 1], with H and W divisible by 16
with torch.inference_mode():
posterior = flux2.encode(image.to(device=device, dtype=torch.bfloat16))
latent_mean = posterior.latent_dist.mean
# Default path: match the usual FLUX.2 convention.
# Whiten here, then let capacitor_decoder unwhiten internally before decode.
latents = flux2_patchify_and_whiten(latent_mean, flux2)
recon = decoder.decode(
latents,
height=int(image.shape[-2]),
width=int(image.shape[-1]),
inference_config=CapacitorDecoderInferenceConfig(num_steps=1),
)
Whitening and dewhitening are optional, but they must stay consistent. The
default above matches the usual FLUX.2 pipeline behavior. If your upstream path
already gives you raw patchified decoder-space latents instead, skip whitening
upstream and call decode(..., latents_are_flux2_whitened=False).
Details
- Default input contract: FLUX.2 patchified latents with FLUX.2 BN whitening still applied.
- Default decoder behavior: unwhiten with saved FLUX.2 BN running stats, then decode.
- Optional raw-latent mode: disable whitening upstream and call
decode(..., latents_are_flux2_whitened=False). - Reused decoder architecture: SemDisDiffAE
- Technical report
- SemDisDiffAE technical report
Citation
@misc{capacitor_decoder,
title = {Capacitor Decoder: A Faster, Lighter FLUX.2-Compatible Latent Decoder},
author = {data-archetype},
email = {data-archetype@proton.me},
year = {2026},
month = apr,
url = {https://huggingface.co/data-archetype/capacitor_decoder},
}
- Downloads last month
- 14