--- license: apache-2.0 tags: - diffusion - autoencoder - image-reconstruction - decoder-only - flux-compatible - pytorch --- # data-archetype/capacitor_decoder **Capacitor decoder**: a faster, lighter FLUX.2-compatible latent decoder built on the [SemDisDiffAE](https://huggingface.co/data-archetype/semdisdiffae) architecture. ## Decode Speed | Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM | |---:|---:|---:|---:|---:|---:|---:| | `512x512` | `1.85x` | `59.3%` | `11.40` | `21.14` | `391.6 MiB` | `961.9 MiB` | | `1024x1024` | `3.28x` | `79.1%` | `26.31` | `86.24` | `601.4 MiB` | `2876.4 MiB` | | `2048x2048` | `4.70x` | `86.4%` | `86.29` | `405.84` | `1437.4 MiB` | `10531.4 MiB` | These measurements are decode-only. Each image is first encoded once with the same FLUX.2 encoder, latents are cached in memory, and then both decoders are timed over the same cached latent set. ## 2k PSNR Benchmark | Model | Mean PSNR (dB) | Std (dB) | Median (dB) | Min (dB) | P5 (dB) | P95 (dB) | Max (dB) | |---|---:|---:|---:|---:|---:|---:|---:| | FLUX.2 VAE | 36.28 | 4.53 | 36.07 | 22.73 | 28.89 | 43.63 | 47.38 | | capacitor_decoder | 36.34 | 4.50 | 36.29 | 23.28 | 29.06 | 43.66 | 47.43 | | Delta vs FLUX.2 | Mean (dB) | Std (dB) | Median (dB) | Min (dB) | P5 (dB) | P95 (dB) | Max (dB) | |---|---:|---:|---:|---:|---:|---:|---:| | capacitor_decoder - FLUX.2 | 0.055 | 0.531 | 0.062 | -1.968 | -0.811 | 0.886 | 2.807 | Evaluated on `2000` validation images: roughly `2/3` photographs and `1/3` book covers. Each image is encoded once with FLUX.2 and reused for both decoders. [Results viewer](https://huggingface.co/spaces/data-archetype/capacitor_decoder-results) ## Usage ```python import torch from diffusers.models import AutoencoderKLFlux2 from capacitor_decoder import CapacitorDecoder, CapacitorDecoderInferenceConfig def flux2_patchify_and_whiten( latents: torch.Tensor, vae: AutoencoderKLFlux2, ) -> torch.Tensor: b, c, h, w = latents.shape if h % 2 != 0 or w % 2 != 0: raise ValueError(f"Expected even FLUX.2 latent grid, got H={h}, W={w}") z = latents.reshape(b, c, h // 2, 2, w // 2, 2) z = z.permute(0, 1, 3, 5, 2, 4).reshape(b, c * 4, h // 2, w // 2) mean = vae.bn.running_mean.view(1, -1, 1, 1).to(device=z.device, dtype=torch.float32) var = vae.bn.running_var.view(1, -1, 1, 1).to(device=z.device, dtype=torch.float32) std = torch.sqrt(var + float(vae.config.batch_norm_eps)) return (z.to(torch.float32) - mean) / std device = "cuda" flux2 = AutoencoderKLFlux2.from_pretrained( "BiliSakura/VAEs", subfolder="FLUX2-VAE", torch_dtype=torch.bfloat16, ).to(device) decoder = CapacitorDecoder.from_pretrained( "data-archetype/capacitor_decoder", device=device, dtype=torch.bfloat16, ) image = ... # [1, 3, H, W] in [-1, 1], with H and W divisible by 16 with torch.inference_mode(): posterior = flux2.encode(image.to(device=device, dtype=torch.bfloat16)) latent_mean = posterior.latent_dist.mean # Default path: match the usual FLUX.2 convention. # Whiten here, then let capacitor_decoder unwhiten internally before decode. latents = flux2_patchify_and_whiten(latent_mean, flux2) recon = decoder.decode( latents, height=int(image.shape[-2]), width=int(image.shape[-1]), inference_config=CapacitorDecoderInferenceConfig(num_steps=1), ) ``` Whitening and dewhitening are optional, but they **must** stay consistent. The default above matches the usual FLUX.2 pipeline behavior. If your upstream path already gives you raw patchified decoder-space latents instead, skip whitening upstream and call `decode(..., latents_are_flux2_whitened=False)`. ## Details - Default input contract: FLUX.2 patchified latents with FLUX.2 BN whitening still applied. - Default decoder behavior: unwhiten with saved FLUX.2 BN running stats, then decode. - Optional raw-latent mode: disable whitening upstream and call `decode(..., latents_are_flux2_whitened=False)`. - Reused decoder architecture: [SemDisDiffAE](https://huggingface.co/data-archetype/semdisdiffae) - [Technical report](technical_report_capacitor_decoder.md) - [SemDisDiffAE technical report](https://huggingface.co/data-archetype/semdisdiffae/blob/main/technical_report_semantic.md) ## Citation ```bibtex @misc{capacitor_decoder, title = {Capacitor Decoder: A Faster, Lighter FLUX.2-Compatible Latent Decoder}, author = {data-archetype}, email = {data-archetype@proton.me}, year = {2026}, month = apr, url = {https://huggingface.co/data-archetype/capacitor_decoder}, } ```