Text-guided Graph Temporal Modeling for few-shot video classification

被引:0
|
作者
Deng, Fuqin [1 ,6 ,7 ]
Zhong, Jiaming [1 ,3 ]
Li, Nannan [2 ]
Fu, Lanhui [1 ]
Jiang, Bingchun [3 ]
Yi, Ningbo [5 ]
Qi, Feng [4 ]
Xin, He [4 ]
Lam, Tin Lun [7 ]
机构
[1] Wuyi Univ, Sch Elect & Informat Engn, Jiangmen, Peoples R China
[2] Macau Univ Sci & Technol, Fac Innovat Engn, Sch Comp Sci & Engn, Macau, Peoples R China
[3] Guangdong Univ Sci & Technol, Sch Mech & Elect Engn, Dongguan, Peoples R China
[4] Wuyi Univ, Sch Appl Phys & Mat Sci, Jiangmen, Peoples R China
[5] Wuyi Univ, Sch Text Mat & Engn, Jiangmen, Peoples R China
[6] Shenzhen Vatop Semicon Tech Co Ltd, Shenzhen, Peoples R China
[7] Chinese Univ Hong Kong, Shenzhen Inst Artificial Intelligence & Robot Soc, Sch Sci & Engn, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Few-shot video classification; Multi-modal learning; Large model application; Graph Temporal Network;
D O I
10.1016/j.engappai.2024.109076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale pre-trained models and graph neural networks have recently demonstrated remarkable success in few-shot video classification tasks. However, they generally suffer from two key limitations: i) the temporal relations between adjacent frames tends to be ambiguous due to the lack of explicit temporal modeling. ii) the absence of multi-modal semantic knowledge in query videos results in inaccurate prototypes construction and an inability to achieve multi-modal temporal alignment metrics. To address these issues, we develop a Text- guided Graph Temporal Modeling (TgGTM) method that consists of two crucial components: a text-guided feature refinement module and a learnable Query text-token contrastive objective. Specifically, the former leverages the Temporal masking layer to guide the model in learning temporal relationships between adjacent frames. Additionally, it utilizes multi-modal information to refine video prototypes for comprehensive few- shot video classification. The latter addresses the feature discrepancy between multi-modal support features and single-modal query features by aligning a learnable Query text-token with corresponding base class text descriptions. Extensive experiments on four commonly used benchmarks demonstrate the effectiveness of our proposed method, which achieves mean accuracies of 54.4%, 80.3%, 91.9%, and 96.2% for 5-way 1shot classification on SSV2-Small, HMDB51, Kinetics, and UCF101, respectively. These results are superior compared to existing state-of-the-art methods. A detailed ablation showcases the importance of learning temporal relationships between adjacent frames and obtaining Query text-token. The source code and models will be publicly available at https://github.com/JiaMingZhong2621/TgGTM.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Transductive Graph-Attention Network for Few-shot Classification
    Pan, Lili
    Liu, Weifeng
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 190 - 195
  • [42] Few-shot Text Classification Method Based on Feature Optimization
    Peng, Jing
    Huo, Shuquan
    JOURNAL OF WEB ENGINEERING, 2023, 22 (03): : 497 - 514
  • [43] Dynamic Memory Induction Networks for Few-Shot Text Classification
    Geng, Ruiying
    Li, Binhua
    Li, Yongbin
    Sun, Jian
    Zhu, Xiaodan
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1087 - 1094
  • [44] Category Decoupled Few-Shot Classification for Graph Neural Network
    Deng, Gelong
    Huang, Guoheng
    Chen, Ziyan
    Computer Engineering and Applications, 2024, 60 (02) : 129 - 136
  • [45] Learning Hierarchical Task Structures for Few-shot Graph Classification
    Wang, Song
    Dong, Yushun
    Huang, Xiao
    Chen, Chen
    Li, Jundong
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (03)
  • [46] Few-Shot Audio Classification with Attentional Graph Neural Networks
    Zhang, Shilei
    Qin, Yong
    Sun, Kewei
    Lin, Yonghua
    INTERSPEECH 2019, 2019, : 3649 - 3653
  • [47] Few-shot Edge Classification in Graph Meta-learning
    Yang, Xiaoxiao
    Xu, Jungang
    2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 166 - 172
  • [48] Graph Complemented Latent Representation for Few-Shot Image Classification
    Zhong, Xian
    Gu, Cheng
    Ye, Mang
    Huang, Wenxin
    Lin, Chia-Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1979 - 1990
  • [49] Generalized Few-Shot Node Classification With Graph Knowledge Distillation
    Wang, Jialong
    Zhou, Mengting
    Zhang, Shilong
    Gong, Zhiguo
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
  • [50] Temporal Transductive Inference for Few-Shot Video Object Segmentation
    Siam, Mennatullah
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,