Trained for 0 epochs and 500 steps.

Trained with datasets ['text-embeds', 'mj-v6']
Learning rate 8e-06, batch size 32, and 3 gradient accumulation steps.
Used DDPM noise scheduler for training with epsilon prediction type and rescaled_betas_zero_snr=False
Using 'trailing' timestep spacing.
Base model: PixArt-alpha/PixArt-Sigma-XL-2-1024-MS
VAE: madebyollin/sdxl-vae-fp16-fix

Files changed (9) hide show

.gitattributes +1 -0
README.md +111 -0
optimizer.bin +3 -0
random_states_0.pkl +3 -0
scheduler.bin +3 -0
training_state-mj-v6.json +3 -0
training_state.json +1 -0
transformer/config.json +30 -0
transformer/diffusion_pytorch_model.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -35,3 +35,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 assets/image_0_0.png filter=lfs diff=lfs merge=lfs -text
 assets/image_1_0.png filter=lfs diff=lfs merge=lfs -text

 *tfevents* filter=lfs diff=lfs merge=lfs -text
 assets/image_0_0.png filter=lfs diff=lfs merge=lfs -text
 assets/image_1_0.png filter=lfs diff=lfs merge=lfs -text
+training_state-mj-v6.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,111 @@

+---
+license: creativeml-openrail-m
+base_model: "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS"
+tags:
+  - stable-diffusion
+  - stable-diffusion-diffusers
+  - text-to-image
+  - diffusers
+  - full
+inference: true
+widget:
+- text: 'unconditional (blank prompt)'
+  parameters:
+    negative_prompt: 'blurry, cropped, ugly'
+  output:
+    url: ./assets/image_0_0.png
+- text: 'ethnographic photography of teddy bear at a picnic'
+  parameters:
+    negative_prompt: 'blurry, cropped, ugly'
+  output:
+    url: ./assets/image_1_0.png
+---
+# pixart-training
+This is a full rank finetune derived from [PixArt-alpha/PixArt-Sigma-XL-2-1024-MS](https://huggingface.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS).
+The main validation prompt used during training was:
+```
+ethnographic photography of teddy bear at a picnic
+```
+## Validation settings
+- CFG: `7.5`
+- CFG Rescale: `0.0`
+- Steps: `30`
+- Sampler: `euler`
+- Seed: `42`
+- Resolution: `1024`
+Note: The validation settings are not necessarily the same as the [training settings](#training-settings).
+You can find some example images in the following gallery:
+<Gallery />
+The text encoder **was not** trained.
+You may reuse the base model text encoder for inference.
+## Training settings
+- Training epochs: 0
+- Training steps: 500
+- Learning rate: 8e-06
+- Effective batch size: 96
+  - Micro-batch size: 32
+  - Gradient accumulation steps: 3
+  - Number of GPUs: 1
+- Prediction type: epsilon
+- Rescaled betas zero SNR: False
+- Optimizer: AdamW, stochastic bf16
+- Precision: Pure BF16
+- Xformers: Enabled
+## Datasets
+### mj-v6
+- Repeats: 0
+- Total number of images: 199872
+- Total number of aspect buckets: 1
+- Resolution: 1.0 megapixels
+- Cropped: False
+- Crop style: None
+- Crop aspect: None
+## Inference
+```python
+import torch
+from diffusers import DiffusionPipeline
+model_id = "pixart-training"
+prompt = "ethnographic photography of teddy bear at a picnic"
+negative_prompt = "malformed, disgusting, overexposed, washed-out"
+pipeline = DiffusionPipeline.from_pretrained(model_id)
+pipeline.to('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
+image = pipeline(
+    prompt=prompt,
+    negative_prompt='blurry, cropped, ugly',
+    num_inference_steps=30,
+    generator=torch.Generator(device='cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu').manual_seed(1641421826),
+    width=1152,
+    height=768,
+    guidance_scale=7.5,
+    guidance_rescale=0.0,
+).images[0]
+image.save("output.png", format="PNG")
+```

optimizer.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:58125cbdeee71875e41dcb0364eca7fb41c0768eee8e8f8c72612c9376012283
+size 3665677155

random_states_0.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2cceeb55a33a3db4f1a295e5aa0a8fcea8f2638c53ec5216a82c7db9b65c4858
+size 14344

scheduler.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:57feaeea732a8232dc14923ac8e8cff564f2d6d11728d1405a7f3cfc02efb7ed
+size 1000

training_state-mj-v6.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7828c00d6f87d54210b7888c9040dee97e356126dc1d3916106ee737f452288c
+size 19126435

training_state.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"global_step": 500, "epoch_step": 500, "epoch": 1, "exhausted_backends": [], "repeats": {}}

transformer/config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "_class_name": "PixArtTransformer2DModel",
+  "_diffusers_version": "0.29.0",
+  "_name_or_path": "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
+  "activation_fn": "gelu-approximate",
+  "attention_bias": true,
+  "attention_head_dim": 72,
+  "attention_type": "default",
+  "caption_channels": 4096,
+  "cross_attention_dim": 1152,
+  "double_self_attention": false,
+  "dropout": 0.0,
+  "in_channels": 4,
+  "interpolation_scale": 2,
+  "norm_elementwise_affine": false,
+  "norm_eps": 1e-06,
+  "norm_num_groups": 32,
+  "norm_type": "ada_norm_single",
+  "num_attention_heads": 16,
+  "num_embeds_ada_norm": 1000,
+  "num_layers": 28,
+  "num_vector_embeds": null,
+  "only_cross_attention": false,
+  "out_channels": 8,
+  "patch_size": 2,
+  "sample_size": 128,
+  "upcast_attention": false,
+  "use_additional_conditions": false,
+  "use_linear_projection": false
+}

transformer/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b106bfee3490f721f128596f246bffc8dc8e9d711ef62f21a1532186ba50e5ad
+size 1221780352