comfyui version

#1
by Kutches - opened

Does the comfyui version need special nodes or it can be used like normal?

used like normal with load diffusion model node

used like normal with load diffusion model node

Question, now that you tested what are your impression how fast it is? also its a bit better, equal or slightly worse in quality?

used like normal with load diffusion model node

Question, now that you tested what are your impression how fast it is? also its a bit better, equal or slightly worse in quality?

It only uses 2 steps instead of the normal 8 so definitely faster and quality is basically the same as the base Z-Image-Turbo for me

So I don't know what I do wrong cause it's much worse for me then orginal ZIT.
I use default workflow and fp8 Twinflow + 2 steps Euler/simple, starting with 1024x1024 but higher resolutions are not better.
ComfyUI_00853_

use other samplers and schedulers, for instance here is res_3m with sigmoid_offset

world_0003

Aren't res_3m etc take twice many time?
I can also make euler with 4 steps and yes it will improve quality (but there are terrible pixeled hair at both images euler 4 steps and res_3m)
ComfyUI_00856_

other sampler take more then it may end same as euler time ..!! , watching the thread regarding it has same quality or not , and please if we can get the fp16 model ?

if you are getting a black screen/image... you grabbed the NONComfy version in the root of the directory. use the ComfyUI folder...

After extensive grid testing.... My favorite is 4 steps with dpmpp_sde + bongtangent as sampler/scheduler.... that is the best combo and works well with 4 steps. 2+3 look ok, but rough, and still the best combo to use for that... (res_2 series and beta are also ok, but not as good as dpm+bong)

I am also seeing some pretty bad quality loss in fine textures. dpmpp_sde isn't a great solution IMO... it's ~2x inference time per step, so at that point you might as well just run normal ZIT at 8 steps in another sampler.

What might help is extracting a LoRA from the TwinFlow model and applying it to ZIT at < 1 strength, and/or with certain blocks pruned.

I've also observed a significant drop in quality in ComfyUI, especially in details like hair and skin. The original z-image model's skin and hair textures were very close to real photos, with extremely detailed hair strands and realistic skin textures. The Twinflow version of the model reminds me of the style of some lower-quality quantized Flux models, with high contrast and blurry details (the jagged edges of the hair are very noticeable), and the skin is smooth, giving it a greasy look.

This is especially noticeable at high resolutions. Using sampling can alleviate this, but it doesn't bring the image back to the level of the original z-image.

My experience is that if you want to generate 1080p images, you need at least 5 steps. The best sampling method is dpmpp_2m_sde_gpu, although dpmpp_2m_sde is better, it's slower. Additionally, lowering the CFG to 0.9-0.7 can increase more details while reducing contrast, but it makes the details blurrier. At 0.6, you get a contrast similar to the original model, but the details are very blurry. You can further increase the steps to counteract this blurriness, but the contrast will also increase; it's a trade-off between contrast and detail. Also, this model has an efficiency penalty for any CFG not equal to 1.0; inference efficiency is highest at 1.0.
In short, I got decent 1920x1080 images with step=6, cfg=0.7, and dpmpp_2m_ds_gpu. My 5090 graphics card took about 9 seconds to generate these images. Sageattention and fast fp16 didn't help much with this model.

For comparison, the original z-image-turbo model with steps=8, eula, cfg=1.0, and with sageattention and fast fp16 enabled, only takes about 7-8 seconds, and the quality is far superior to Twinflow.

z-image-turbo is already a model that has undergone step distillation, making Twinflow basically pointless.

prompt: "Latina female with thick wavy hair, harbor boats and pastel houses behind. Breezy seaside light, warm tones, cinematic close-up. "

Z-Image-Turbo-bf16 (steps=8, cfg=1.0, sample=euler, seed=866845136969323):

image

TwinFlow-Z-Image-Turbo-comfy-bf16 (steps=6, cfg=0.7, sample=dpmpp_2m_sde_gpu, seed=866845136969323):

image

TwinFlow-Z-Image-Turbo-comfy-bf16 (steps=4, cfg=0.7, sample=dpmpp_2m_sde_gpu, seed=866845136969323):

image

TwinFlow-Z-Image-Turbo-comfy-bf16 (steps=2, cfg=0.7, sample=dpmpp_2m_sde_gpu, seed=866845136969323):

image

using 2m is your problem (speedwise), it doesn't need 2m, just normal dpmpp_sde (much faster) works fine. (and since SDE is non-deterministic, I'm looking (again) to see what other sampler is as good. But Twinflow has a place and is faster... Yes, if you want the highest best quality, there are likely better choices, but as a good fast version? Twinflow is a good model.

using 2m is your problem (speedwise), it doesn't need 2m, just normal dpmpp_sde (much faster) works fine. (and since SDE is non-deterministic, I'm looking (again) to see what other sampler is as good. But Twinflow has a place and is faster... Yes, if you want the highest best quality, there are likely better choices, but as a good fast version? Twinflow is a good model.

Not true, dpmpp_sde is much slower than dpmpp_2m_sde. Generating a 1080p image (cfg=0.7) on a 5090 GPU takes approximately 2.80s/it with dpmpp_sde, while dpmpp_2m_sde takes about 1.68s/it.
Furthermore, dpmpp_sde shows significantly greater image differences compared to the original model.

TwinFlow-Z-Image-Turbo-comfy-bf16 (steps=4, cfg=0.7, sample=dpmpp_sde, seed=866845136969323):

image

You can clearly see that the character size proportion is completely different from the character size proportion generated by the original model. Although dpmpp_2m_sde is also not very clear, at least the character proportions are correct.

In fact, according to the TwinFlow source code, this model requires a special sampling method. During generation, it needs to accept an additional target-based time vector tt. This method essentially makes solving the ODE or SDE very straightforward, allowing it to reach the endpoint in very few steps. You can see that they added a new t_embedder_2 module in the weights of z_image, which is used to store this tt. The current ComfyUI doesn't recognize this tensor and cannot utilize its information. Therefore, the inferred images are always blurry, which is essentially a bug.

If you want to reproduce the results shown on their website, you need to manually create some nodes yourself, and then you will get the intended behavior of TwinFlow:

TwinFlow-Z-Image-Turbo-comfy-bf16 (steps=2, gap_end=0.8, sample=TwinFLow Sampler + Scheduler, seed=866845136969323):
image

This image only requires 2 steps and takes a total of 3 seconds. The workflow is as follows:
image

In fact, according to the TwinFlow source code, this model requires a special sampling method. During generation, it needs to accept an additional target-based time vector tt. This method essentially makes solving the ODE or SDE very straightforward, allowing it to reach the endpoint in very few steps. You can see that they added a new t_embedder_2 module in the weights of z_image, which is used to store this tt. The current ComfyUI doesn't recognize this tensor and cannot utilize its information. Therefore, the inferred images are always blurry, which is essentially a bug.

If you want to reproduce the results shown on their website, you need to manually create some nodes yourself, and then you will get the intended behavior of TwinFlow:

This image only requires 2 steps and takes a total of 3 seconds. The workflow is as follows:
image

Where did you get those nodes? And if you authored them, please do share.

In fact, according to the TwinFlow source code, this model requires a special sampling method. During generation, it needs to accept an additional target-based time vector tt. This method essentially makes solving the ODE or SDE very straightforward, allowing it to reach the endpoint in very few steps. You can see that they added a new t_embedder_2 module in the weights of z_image, which is used to store this tt. The current ComfyUI doesn't recognize this tensor and cannot utilize its information. Therefore, the inferred images are always blurry, which is essentially a bug.

If you want to reproduce the results shown on their website, you need to manually create some nodes yourself, and then you will get the intended behavior of TwinFlow:

This image only requires 2 steps and takes a total of 3 seconds. The workflow is as follows:
image

Where did you get those nodes? And if you authored them, please do share.

I wrote this myself, reimplementing the Model Patcher, Sampler, and Scheduler based on TwinFlow's source code and ComfyUI's logic.
It's much lighter than the node created by directly throwing the TwinFlow source code into ComfyUI:

https://github.com/smthemex/ComfyUI_TwinFlow

It also conforms better to ComfyUI's specifications and composability. The inputs and outputs are standard nodes, and it doesn't rely on flash attention (smthemex's node relies on flash attention because it directly uses TwinFlow's code). You can freely insert LoRA and other things into the workflow, such as trying Sage Attention.

I'm still making adjustments, and it doesn't support Qwen yet. Once there are no major issues, I will consider open-sourcing it.

Please do open source it and then iterate on adjustments. That will also allow others to validate your results.

Please do open source it and then iterate on adjustments. That will also allow others to validate your results.

Finished and released https://github.com/mengqin/ComfyUI-TwinFlow/

Sign up or log in to comment