Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

got bad results: quant with 1024 image using sdxl-tubo, using 1.3.1 Normal infer, follow your steps #9

Open
greasebig opened this issue Jun 28, 2024 · 17 comments

Comments

@greasebig
Copy link

No description provided.

@greasebig
Copy link
Author

greasebig commented Jun 28, 2024

i am really curious about that whether your result can be wholly reproced. using your quanted config can get good result. but when i quant sdxl-turbo from "scratch", i cannot get expected result. only vague results

@A-suozhang
Copy link
Member

Apologies for the delayed response. There may be an underlying issue in your quantization process. Could you please provide more detailed information about the experimental settings (how does "from scratch" means), so that we can help you with the problem?

@greasebig
Copy link
Author

here is my process:
1.1 Generate Calibration Data : CUDA_VISIBLE_DEVICES=$1 python scripts/gen_calib_data.py --config ./configs/stable-diffusion/$config_name --save_image_path ./debug_imgs
1.2 Post Training Quantization (PTQ) Process : CUDA_VISIBLE_DEVICES=$2 python scripts/ptq.py --config ./configs/stable-diffusion/${cfg_name} --outdir ./logs/$1 --seed 42
1.3 Inference Quantized Model : CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8

after above process, i got result like this:

image

@greasebig
Copy link
Author

then, i tried The Mixed Precision Search Process.
Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42
Phase 2: Get Sensitivity : ...
Phase 3: Integer Programming : ...
Phase 4: Choose the optimal config : ...
Inference with mixed precision quantized model : (i used my own optimal config got from above process)
python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95

got result like this:
image

@A-suozhang
Copy link
Member

here is my process: 1.1 Generate Calibration Data : CUDA_VISIBLE_DEVICES=$1 python scripts/gen_calib_data.py --config ./configs/stable-diffusion/$config_name --save_image_path ./debug_imgs 1.2 Post Training Quantization (PTQ) Process : CUDA_VISIBLE_DEVICES=$2 python scripts/ptq.py --config ./configs/stable-diffusion/${cfg_name} --outdir ./logs/$1 --seed 42 1.3 Inference Quantized Model : CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8

after above process, i got result like this:

image

This process conducts uniform bit-width W8A8 quantization without mixed precision, which would produce unsatisfying result. you could try adding the --act_protect with the existing command.

@greasebig
Copy link
Author

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95

got result like this: image

i have tried --act_protect here

@A-suozhang
Copy link
Member

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95
got result like this: image

i have tried --act_protect here

Actually I meant try adding the act_protect for the uniform bit-width W8A8 command.

CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8 --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt

@A-suozhang
Copy link
Member

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95

got result like this: image

This phenonmenon is probabily due to the sub-optimal mixed precision configuration. does the mixed precision searching process raise any errors?

@greasebig
Copy link
Author

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95
got result like this: image

i have tried --act_protect here

Actually I meant try adding the act_protect for the uniform bit-width W8A8 command.

CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8 --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt

image
i need to change your code to use --act_protect if without --config_act_mp

@greasebig
Copy link
Author

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95
got result like this: image

This phenonmenon is probabily due to the sub-optimal mixed precision configuration. does the mixed precision searching process raise any errors?

i just followed your steps to search mixed precision config. didn't get any errors

@greasebig
Copy link
Author

then, i tried The Mixed Precision Search Process. Phase 1: PTQ : python scripts/ptq.py --config ./configs/stable-diffusion/sdxl-turbo.yaml --outdir --seed 42 Phase 2: Get Sensitivity : ... Phase 3: Integer Programming : ... Phase 4: Choose the optimal config : ... Inference with mixed precision quantized model : (i used my own optimal config got from above process) python scripts/quant_txt2img.py --base_path ./logs/sdxl-turbo-1024fp32 --config_weight_mp ./logs/sdxl-turbo-1024fp32/weight_4.73_0.96.yaml --config_act_mp ./logs/sdxl-turbo-1024fp32/act_7.50_0.95.yaml --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt --image_folder ./logs/sdxl-turbo-1024fp32/generated_images_weight_4.73_0.96_act_7.50_0.95
got result like this: image

i have tried --act_protect here

Actually I meant try adding the act_protect for the uniform bit-width W8A8 command.

CUDA_VISIBLE_DEVICES=$1 python scripts/quant_txt2img.py --base_path $CKPT_PATH --batch_size 2 --num_imgs 8 --act_protect ./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt

image i need to change your code to use --act_protect if without --config_act_mp

i may try later

@A-suozhang
Copy link
Member

I see. The all W8A8 quantization will generate images with visual degradation. Therefore, we identify the top 1% sensitive layers, and preserve them as FP for performance preservation. Therefore, to generate images with good quality, "act_protect" should be specified.

We design the mixed precision W8A8, which uses all 8 as W bit-width (weight_8.00.yaml), and average of 7.77 A bit, to account for the FP16 layers to make the average Activation bit-width as 8 bit.

(The command we provide as example)

# Mixed Precision Quant Inference
WEIGHT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/weight/weight_8.00.yaml"  # [weight_5.02.yaml, weight_8.00.yaml]
ACT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_7.77.yaml "
ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"

If you want a full W8A8 model, simply changing all the bit-width in act_7.77.yaml as 8 should work.

@greasebig
Copy link
Author

I see. The all W8A8 quantization will generate images with visual degradation. Therefore, we identify the top 1% sensitive layers, and preserve them as FP for performance preservation. Therefore, to generate images with good quality, "act_protect" should be specified.

We design the mixed precision W8A8, which uses all 8 as W bit-width (weight_8.00.yaml), and average of 7.77 A bit, to account for the FP16 layers to make the average Activation bit-width as 8 bit.

(The command we provide as example)

# Mixed Precision Quant Inference
WEIGHT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/weight/weight_8.00.yaml"  # [weight_5.02.yaml, weight_8.00.yaml]
ACT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_7.77.yaml "
ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"

If you want a full W8A8 model, simply changing all the bit-width in act_7.77.yaml as 8 should work.

i know these commands. using them can produce good results. but my question is how to obtain these configs?

# Mixed Precision Quant Inference
WEIGHT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/weight/weight_8.00.yaml"  # [weight_5.02.yaml, weight_8.00.yaml]
ACT_MP_CFG="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_7.77.yaml "
ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"

i tried your steps listed in https://github.com/A-suozhang/MixDQ/blob/master/mixed_precision_scripts/mixed_precision_search.md.
however, only generate vague images that like what i have posted above

@A-suozhang
Copy link
Member

If you simply want to conduct W8A8 quantization, you could set all bit-width in the WEIGHT_MP_CFG & ACT_MP_CFG as 8-bit with act_protect layers.

If you want to search your own mixed precision configuration. After acquiring the layer sensitivity, you may need to run multiple times for integer programming with different seed / target bit-width to generate a few candidate mixed precision configurations, and select the optimal one based on visual quality of the actual generated image.

@A-suozhang
Copy link
Member

For more details of the search process, you may refer to the Appendix of our paper. Sorry for the unclear description in mixed_precision_search.md. We will revise it to make it more clear.

@greasebig
Copy link
Author

greasebig commented Jul 2, 2024

For more details of the search process, you may refer to the Appendix of our paper. Sorry for the unclear description in mixed_precision_search.md. We will revise it to make it more clear.

Also, hope you can disclose more about the process of acquiring the act protect config

ACT_PROTECT="./mixed_precision_scripts/mixed_percision_config/sdxl_turbo/final_config/act/act_sensitivie_a8_1%.pt"

in mixed_precision_search.md currently, just uses it directly but didn't show how to obtain it

@A-suozhang
Copy link
Member

The act_sensitive_a8_1% is the top 1% layers according to the layer sensitivity. Specifically, we choose the top 1% layers of each group, ranked by different metric. We will supplement this part of the code in the future update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants