ostris
/

zimage_turbo_training_adapter

template:diffusion-lora

Model card Files Files and versions

ostris commited on 14 days ago

Commit

91cc0c7

·

verified ·

1 Parent(s): 5c10b08

Update README.md

Files changed (1) hide show

README.md +42 -3

README.md CHANGED Viewed

@@ -1,3 +1,42 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+base_model:
+- Tongyi-MAI/Z-Image-Turbo
+---
+# Z-Image-Turbo Training Adapter
+This is a training adapter designed to be used for fine-tuning [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo).
+It was made for use with [AI Toolkit](https://github.com/ostris/ai-toolkit) but could potentially be used in other trainers as well. It can
+also be used as a general de-distillation LoRA for inference to remove the "Turbo" from "Z-Image-Turbo".
+### Why is it needed?
+When you train directly on a step distilled model, the distillation breaks down very quickly. This results in losing the step distillation
+in an unpredictable way. A de-distill training adapter slows this process down significantly allowing you to do short training runs while
+preserving the step distillation (speed).
+### What is the catch?
+This is really just a hack to significantly slow down the distillation when fine-tuning a distilled model. The distillation will
+still be broken down over time. What that means is, this adapter will work great for shorter runs such as styles, concepts, and
+characters. However, doing a long training run will likely lead to the distillation breaking down to a point where artifacts
+will be produced when the adapter is removed.
+### How was it made?
+I generated thousands of images at various sizes and aspect ratios using
+[Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo). Then I simply trained a LoRA on those images at a low learning
+rate (1e-5). This allowed the distillation to break down while preserving the model's existing knowledge.
+### How does it work?
+Since this adapter has broken down the distillation, if you train a LoRA on top of it, the distillation will no longer break down in
+your new LoRA, since this adapter has de-distilled the model. Your LoRA will now only learn the subject you are training. When
+it comes time to run inference / sampling, we remove this training adapter which leaves your new information on the distilled model
+allowing your new information to run at distilled speeds. Attached, is an example of a short training run on a character with and without
+this adapter
+![zimage_adapter](https://cdn-uploads.huggingface.co/production/uploads/643cb43e6eeb746f5ad81c26/HF2PcFVl4haJzjrNGFHfC.jpeg)