got bad results: quant with 1024 image using sdxl-tubo, using 1.3.1 Normal infer, follow your steps #9

greasebig · 2024-06-28T02:48:45Z

No description provided.

greasebig · 2024-06-28T08:09:49Z

i am really curious about that whether your result can be wholly reproced. using your quanted config can get good result. but when i quant sdxl-turbo from "scratch", i cannot get expected result. only vague results

A-suozhang · 2024-06-28T08:28:34Z

Apologies for the delayed response. There may be an underlying issue in your quantization process. Could you please provide more detailed information about the experimental settings (how does "from scratch" means), so that we can help you with the problem?

greasebig · 2024-07-01T05:49:15Z

here is my process:
1.1 Generate Calibration Data : CUDA_VISIBLE_DEVICES=$1 python scripts/gen_calib_data.py --config ./configs/stable-diffusion/$config_name --save_image_path ./debug_imgs
1.2 Post Training Quantization (PTQ) Process : CUDA_VISIBLE_DEVICES=$2 python scripts/ptq.py --config ./configs/stable-diffusion/${cfg_name} --outdir ./logs/$1 --seed 42
1.3 Inference Quantized Model : CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8

after above process, i got result like this:

greasebig · 2024-07-01T05:55:37Z

then, i tried The Mixed Precision Search Process.
Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42
Phase 2: Get Sensitivity : ...
Phase 3: Integer Programming : ...
Phase 4: Choose the optimal config : ...
Inference with mixed precision quantized model : (i used my own optimal config got from above process)
python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95

got result like this:

A-suozhang · 2024-07-01T07:57:39Z

here is my process: 1.1 Generate Calibration Data : CUDA_VISIBLE_DEVICES=$1 python scripts/gen_calib_data.py --config ./configs/stable-diffusion/$config_name --save_image_path ./debug_imgs 1.2 Post Training Quantization (PTQ) Process : CUDA_VISIBLE_DEVICES=$2 python scripts/ptq.py --config ./configs/stable-diffusion/${cfg_name} --outdir ./logs/$1 --seed 42 1.3 Inference Quantized Model : CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8

after above process, i got result like this:

This process conducts uniform bit-width W8A8 quantization without mixed precision, which would produce unsatisfying result. you could try adding the --act_protect with the existing command.

greasebig · 2024-07-01T08:23:59Z

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95

got result like this:

i have tried --act_protect here

A-suozhang · 2024-07-01T08:27:28Z

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95
got result like this:

i have tried --act_protect here

Actually I meant try adding the act_protect for the uniform bit-width W8A8 command.

CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8 --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt

A-suozhang · 2024-07-01T08:29:07Z

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95

got result like this:

This phenonmenon is probabily due to the sub-optimal mixed precision configuration. does the mixed precision searching process raise any errors?

greasebig · 2024-07-01T08:32:08Z

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95
got result like this:

i have tried --act_protect here

Actually I meant try adding the act_protect for the uniform bit-width W8A8 command.
CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8 --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt

i need to change your code to use --act_protect if without --config_act_mp

greasebig · 2024-07-01T08:33:09Z

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95
got result like this:

This phenonmenon is probabily due to the sub-optimal mixed precision configuration. does the mixed precision searching process raise any errors?

i just followed your steps to search mixed precision config. didn't get any errors

greasebig · 2024-07-01T08:33:40Z

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95
got result like this:

i have tried --act_protect here

Actually I meant try adding the act_protect for the uniform bit-width W8A8 command.
CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8 --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt
i need to change your code to use --act_protect if without --config_act_mp

i may try later

A-suozhang · 2024-07-01T08:38:54Z

I see. The all W8A8 quantization will generate images with visual degradation. Therefore, we identify the top 1% sensitive layers, and preserve them as FP for performance preservation. Therefore, to generate images with good quality, "act_protect" should be specified.

We design the mixed precision W8A8, which uses all 8 as W bit-width (weight_8.00.yaml), and average of 7.77 A bit, to account for the FP16 layers to make the average Activation bit-width as 8 bit.

(The command we provide as example)

# Mixed Precision Quant Inference
WEIGHT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/weight/weight_8.00.yaml"  # [weight_5.02.yaml, weight_8.00.yaml]
ACT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_7.77.yaml "
ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"

If you want a full W8A8 model, simply changing all the bit-width in act_7.77.yaml as 8 should work.

greasebig · 2024-07-01T09:34:41Z

I see. The all W8A8 quantization will generate images with visual degradation. Therefore, we identify the top 1% sensitive layers, and preserve them as FP for performance preservation. Therefore, to generate images with good quality, "act_protect" should be specified.

We design the mixed precision W8A8, which uses all 8 as W bit-width (weight_8.00.yaml), and average of 7.77 A bit, to account for the FP16 layers to make the average Activation bit-width as 8 bit.

(The command we provide as example)
# Mixed Precision Quant Inference
WEIGHT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/weight/weight_8.00.yaml"  # [weight_5.02.yaml, weight_8.00.yaml]
ACT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_7.77.yaml "
ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"
If you want a full W8A8 model, simply changing all the bit-width in act_7.77.yaml as 8 should work.

i know these commands. using them can produce good results. but my question is how to obtain these configs?

# Mixed Precision Quant Inference
WEIGHT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/weight/weight_8.00.yaml"  # [weight_5.02.yaml, weight_8.00.yaml]
ACT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_7.77.yaml "
ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"

i tried your steps listed in https://github.com/A-suozhang/MixDQ/blob/master/mixed_precision_scripts/mixed_precision_search.md.
however, only generate vague images that like what i have posted above

A-suozhang · 2024-07-02T02:26:08Z

If you simply want to conduct W8A8 quantization, you could set all bit-width in the WEIGHT_MP_CFG & ACT_MP_CFG as 8-bit with act_protect layers.

If you want to search your own mixed precision configuration. After acquiring the layer sensitivity, you may need to run multiple times for integer programming with different seed / target bit-width to generate a few candidate mixed precision configurations, and select the optimal one based on visual quality of the actual generated image.

A-suozhang · 2024-07-02T02:27:39Z

For more details of the search process, you may refer to the Appendix of our paper. Sorry for the unclear description in mixed_precision_search.md. We will revise it to make it more clear.

greasebig · 2024-07-02T12:22:36Z

For more details of the search process, you may refer to the Appendix of our paper. Sorry for the unclear description in mixed_precision_search.md. We will revise it to make it more clear.

Also, hope you can disclose more about the process of acquiring the act protect config

ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"

in mixed_precision_search.md currently, just uses it directly but didn't show how to obtain it

A-suozhang · 2024-07-03T13:54:08Z

The act_sensitive_a8_1% is the top 1% layers according to the layer sensitivity. Specifically, we choose the top 1% layers of each group, ranked by different metric. We will supplement this part of the code in the future update.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

got bad results: quant with 1024 image using sdxl-tubo, using 1.3.1 Normal infer, follow your steps #9

got bad results: quant with 1024 image using sdxl-tubo, using 1.3.1 Normal infer, follow your steps #9

greasebig commented Jun 28, 2024

greasebig commented Jun 28, 2024 •

edited

Loading

A-suozhang commented Jun 28, 2024

greasebig commented Jul 1, 2024

greasebig commented Jul 1, 2024

A-suozhang commented Jul 1, 2024

greasebig commented Jul 1, 2024

A-suozhang commented Jul 1, 2024

A-suozhang commented Jul 1, 2024

greasebig commented Jul 1, 2024

greasebig commented Jul 1, 2024

greasebig commented Jul 1, 2024

A-suozhang commented Jul 1, 2024

greasebig commented Jul 1, 2024

A-suozhang commented Jul 2, 2024

A-suozhang commented Jul 2, 2024

greasebig commented Jul 2, 2024 •

edited

Loading

A-suozhang commented Jul 3, 2024

got bad results: quant with 1024 image using sdxl-tubo, using 1.3.1 Normal infer, follow your steps #9

got bad results: quant with 1024 image using sdxl-tubo, using 1.3.1 Normal infer, follow your steps #9

Comments

greasebig commented Jun 28, 2024

greasebig commented Jun 28, 2024 • edited Loading

A-suozhang commented Jun 28, 2024

greasebig commented Jul 1, 2024

greasebig commented Jul 1, 2024

A-suozhang commented Jul 1, 2024

greasebig commented Jul 1, 2024

A-suozhang commented Jul 1, 2024

A-suozhang commented Jul 1, 2024

greasebig commented Jul 1, 2024

greasebig commented Jul 1, 2024

greasebig commented Jul 1, 2024

A-suozhang commented Jul 1, 2024

greasebig commented Jul 1, 2024

A-suozhang commented Jul 2, 2024

A-suozhang commented Jul 2, 2024

greasebig commented Jul 2, 2024 • edited Loading

A-suozhang commented Jul 3, 2024

greasebig commented Jun 28, 2024 •

edited

Loading

greasebig commented Jul 2, 2024 •

edited

Loading