Skip to content

Analysis of video quality datasets via design of minimalistic video quality models

License

Notifications You must be signed in to change notification settings

sunwei925/MinimalisticVQA

Repository files navigation

MinimalisticVQA

Pytorch License arXiv

This is a repository for the models proposed in the paper "Analysis of video quality datasets via design of minimalistic video quality models". TPAMI Version Arxiv Version

Introduction

Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by comparing our model generalization capabilities on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.

Model Definitions of MinimalisticVQA

Model Spatial Quality Analyzer Temporal Quality Analyzer Weights trained on LSVQ
Model I ResNet-50 (ImageNet-1k) None weights
Model II ResNet-50 (pre-trained on IQA datasets) None as Model I
Model III ResNet-50 (pre-trained on the LSVQ dataset) None
Model IV ResNet-50 (ImageNet-1k) SlowFast weights
Model V ResNet-50 (pre-trained on IQA datasets) SlowFast
Model VI ResNet-50 (pre-trained on the LSVQ dataset) SlowFast
Model VII Swin-B (ImageNet-1k) None weights
Model VIII Swin-B (pre-trained on the LSVQ dataset) None as Model VII
Model IX Swin-B (ImageNet-1k) SlowFast weights
Model X Swin-B (pre-trained on the LSVQ dataset) SlowFast as Model IX

Usage

Test Datasets

For detail introduction of these datasets, please refer to the paper.

Train the model

  • Extract the images:
python -u frame_extraction/extract_frame.py \
--dataset KoNViD1k \
--dataset_file data/KoNViD-1k_data.mat \
--videos_dir /data/sunwei_data/konvid1k \
--save_folder /data/sunwei_data/video_data/KoNViD1k/image_384p \
--video_length_min 10 \
--resize 384 \
>> logs/extract_frame_KoNViD1k_384p.log
  • Extract the temporal features:
CUDA_VISIBLE_DEVICES=0 python -u temporal_feature_extraction/extract_temporal_feature.py \
--dataset KoNViD1k \
--dataset_file data/KoNViD-1k_data.mat \
--videos_dir  /data/sunwei_data/konvid1k \
--feature_save_folder /data/sunwei_data/video_data/KoNViD1k/temporal_feature_mid_sr_1 \
--sample_type mid \
--sample_rate 1 \
--resize 224 \
>> logs/extract_feature_KoNViD1k_temporal_feature_mid_sr_1.log
  • Train the model:
CUDA_VISIBLE_DEVICES=0,1 python -u train_BVQA.py \
--dataset KoNViD1k \
--model_name Model_IX \
--datainfo data/KoNViD-1k_data.mat \
--videos_dir /data/sunwei_data/video_data/KoNViD1k/image_384p \
--lr 0.00001 \
--decay_ratio 0.9 \
--decay_interval 10 \
--print_samples 400 \
--train_batch_size 6 \
--num_workers 8 \
--resize 384 \
--crop_size 384 \
--epochs 30 \
--ckpt_path /data/sunwei_data/video_data/MinimalisticVQA_model/KoNViD1k/ \
--multi_gpu \
--n_exp 10 \
--sample_rate 1 \
--feature_dir /data/sunwei_data/video_data/KoNViD1k/temporal_feature_mid_sr_1 \
>> logs/train_BVQA_KoNViD1k_Model_IX.log

Test

Download a trained model (e.g. Model XI and the scaling file (for quality rescaling) trained on LSVQ).

CUDA_VISIBLE_DEVICES=0 python -u test_video.py \
--model_path /data/sunwei_data/video_data/MinimalisticVQA_model/LSVQ/MinimalisticVQA_Model_IX_LSVQ.pth \ # your model file
--popt_path popt/LSVQ_Model_IX.npy \ # your popt file
--model_name Model_IX \
--video_name Zebra_Mussels_Not_Welcome_Here.mp4 \ # your video name
--video_path /data/sunwei_data/LSVQ/ia-batch1 \ # your video path
--resize 384 \
--crop_size 384 \
--video_number_min 8 \
--sample_rate 1 \
--sample_type mid \
--output logs/video_score.log \
--is_gpu

Citation

If you find this code is useful for your research, please cite:

@article{sun2024analysis,
  title={Analysis of video quality datasets via design of minimalistic video quality models},
  author={Sun, Wei and Wen, Wen and Min, Xiongkuo and Lan, Long and Zhai, Guangtao and Ma, Kede},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024},
  publisher={IEEE}
}