Efficient video quality assessment with deeper spatiotemporal feature extraction and integration

被引:5
|
作者
Liu, Yinhao [1 ]
Zhou, Xiaofei [2 ]
Yin, Haibing [1 ]
Wang, Hongkui [1 ]
Yan, Chenggang [2 ]
机构
[1] Hangzhou Dianzi Univ, Sch Commun Engn, Hangzhou, Peoples R China
[2] Hangzhou Dianzi Univ, Sch Automat, Hangzhou, Peoples R China
关键词
video quality assessment; no-reference/blind; user-generated content; deep learning; deeper temporal correlation; reverse hierarchy theory;
D O I
10.1117/1.JEI.30.6.063034
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The challenge of video quality assessment (VQA) modeling for user-generated content (UGC) (i.e., UGC-VQA) is how to accurately extract discriminative features and elaborately quantify interfeature interactions by following the behavior patterns of human eye-brain vision perception. To address this issue, we propose the Deeper Spatial-Temporal Scoring Network (DSTS-Net) to give a precise VQA. Concretely, we first deploy the multiscale feature extraction module to characterize content-aware features accounting for nonlinear reverse hierarchy theory in video perception process, which is not fully considered in the reported UGC-VQA models. Hierarchical handcraft and semantic features are simultaneously considered using content adaptive weighting. Second, we develop a feature integration structure, i.e., deeper gated recurrent unit (DGRU), to fully imitate the interfeature interactions in visionary perception, including feedforward and feedback processes. Third, the dual DGRU structure is employed to further account for interframe interactions of hierarchical features, imitating the nonlinearity of perception as much as possible. Finally, improved pooling is achieved in the local adaptive smoothing module accounting for the temporal hysteresis. Holistic validation of the proposed method on four public challenging UGC-VQA datasets presents a comparable performance over the state-of-the-art no-reference VQA methods, especially our method can give an accurate prediction of the low quality videos with weak temporal correlation. To promote reproducible research and public evaluation, an implementation of our method has been made available online: https://github.com/liu0527aa/DSTS-Net . (C) 2021 SPIE and IS&T
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Spatiotemporal Feature Integration and Model Fusion for Full Reference Video Quality Assessment
    Bampis, Christos G.
    Li, Zhi
    Bovik, Alan C.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (08) : 2256 - 2270
  • [2] Screen content video quality assessment based on spatiotemporal sparse feature
    Ding, Rui
    Zeng, Huanqiang
    Wen, Hao
    Huang, Hailiang
    Cheng, Shan
    Hou, Junhui
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 96
  • [3] Spatiotemporal Feature Combination Model for No-Reference Video Quality Assessment
    Men, Hui
    Lin, Hanhe
    Saupe, Dietmar
    2018 TENTH INTERNATIONAL CONFERENCE ON QUALITY OF MULTIMEDIA EXPERIENCE (QOMEX), 2018, : 72 - 74
  • [4] Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment
    Zhou, Wei
    Chen, Zhibo
    2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 338 - 341
  • [5] Spatiotemporal feature learning for no-reference gaming content video quality assessment
    Kwong, Ngai-Wing
    Chan, Yui-Lam
    Tsang, Sik-Ho
    Huang, Ziyin
    Lam, Kin-Man
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 100
  • [6] Spatiotemporal Statistics for Video Quality Assessment
    Li, Xuelong
    Guo, Qun
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (07) : 3329 - 3342
  • [7] Feature extraction for efficient image and video segmentation
    Vojvoda, Jakub
    Beran, Vitezslav
    32ND SPRING CONFERENCE ON COMPUTER GRAPHICS (SCCG 2016), 2016, : 75 - 80
  • [8] Efficient color feature extraction in compressed video
    Won, CS
    Park, DK
    Na, IY
    Yoo, SJ
    STORAGE AND RETRIEVAL FOR IMAGE AND VIDEO DATABASES VII, 1998, 3656 : 677 - 686
  • [9] Spatiotemporal Masking for Objective Video Quality Assessment
    He, Ran
    Lu, Wen
    Zhang, Yu
    Gao, Xinbo
    He, Lihuo
    PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 : 309 - 321
  • [10] Full Reference Video Quality Assessment Based on Multi-Scale Spatiotemporal Feature Aggregation
    Zhang, Wei
    Zhao, Shiling
    Liu, Yinhao
    Wang, Hongkui
    Yin, Haibing
    Computer Engineering and Applications, 2023, 59 (18) : 154 - 162