ostris commited on
Commit
91cc0c7
·
verified ·
1 Parent(s): 5c10b08

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -3
README.md CHANGED
@@ -1,3 +1,42 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Tongyi-MAI/Z-Image-Turbo
5
+ ---
6
+
7
+ # Z-Image-Turbo Training Adapter
8
+
9
+ This is a training adapter designed to be used for fine-tuning [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo).
10
+ It was made for use with [AI Toolkit](https://github.com/ostris/ai-toolkit) but could potentially be used in other trainers as well. It can
11
+ also be used as a general de-distillation LoRA for inference to remove the "Turbo" from "Z-Image-Turbo".
12
+
13
+ ### Why is it needed?
14
+
15
+ When you train directly on a step distilled model, the distillation breaks down very quickly. This results in losing the step distillation
16
+ in an unpredictable way. A de-distill training adapter slows this process down significantly allowing you to do short training runs while
17
+ preserving the step distillation (speed).
18
+
19
+ ### What is the catch?
20
+
21
+ This is really just a hack to significantly slow down the distillation when fine-tuning a distilled model. The distillation will
22
+ still be broken down over time. What that means is, this adapter will work great for shorter runs such as styles, concepts, and
23
+ characters. However, doing a long training run will likely lead to the distillation breaking down to a point where artifacts
24
+ will be produced when the adapter is removed.
25
+
26
+ ### How was it made?
27
+
28
+ I generated thousands of images at various sizes and aspect ratios using
29
+ [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo). Then I simply trained a LoRA on those images at a low learning
30
+ rate (1e-5). This allowed the distillation to break down while preserving the model's existing knowledge.
31
+
32
+ ### How does it work?
33
+
34
+ Since this adapter has broken down the distillation, if you train a LoRA on top of it, the distillation will no longer break down in
35
+ your new LoRA, since this adapter has de-distilled the model. Your LoRA will now only learn the subject you are training. When
36
+ it comes time to run inference / sampling, we remove this training adapter which leaves your new information on the distilled model
37
+ allowing your new information to run at distilled speeds. Attached, is an example of a short training run on a character with and without
38
+ this adapter
39
+
40
+
41
+ ![zimage_adapter](https://cdn-uploads.huggingface.co/production/uploads/643cb43e6eeb746f5ad81c26/HF2PcFVl4haJzjrNGFHfC.jpeg)
42
+