FGSVQA: Frequency-Guided Short-form Video Quality Assessment
Abstract
Short-form video poses new challenges to the quality assessment of user-generated content (UGC) due to its complex generation pipeline, rapid content variation, and mixed distortions. To address this challenge, we propose an end-to-end video quality assessment (VQA) framework that employs a dense visual encoder based on CLIP, and incorporates compression priors derived from the frequency domain to generate artifact- and structure-aware weight maps for feature aggregation. By explicitly decomposi...
Description / Details
Short-form video poses new challenges to the quality assessment of user-generated content (UGC) due to its complex generation pipeline, rapid content variation, and mixed distortions. To address this challenge, we propose an end-to-end video quality assessment (VQA) framework that employs a dense visual encoder based on CLIP, and incorporates compression priors derived from the frequency domain to generate artifact- and structure-aware weight maps for feature aggregation. By explicitly decomposing artifact, structure, and original visual feature branches and adaptively fusing them over time through a learned gating module, the proposed method achieves accurate and efficient quality prediction. Experimental results show that our method achieves strong performance on short-form video datasets in terms of average rank and linear correlation (SRCC: 0.736, PLCC: 0.787), while maintaining efficient inference runtime. The code and additional results are available at: https://github.com/xinyiW915/FGSVQA.
Source: arXiv:2605.20016v1 - http://arxiv.org/abs/2605.20016v1 PDF: https://arxiv.org/pdf/2605.20016v1 Original Link: http://arxiv.org/abs/2605.20016v1
Please sign in to join the discussion.
No comments yet. Be the first to share your thoughts!
May 20, 2026
Biomedical Engineering
Engineering
0