Text-guided Graph Temporal Modeling for few-shot video classification

被引:0
|
作者
Deng, Fuqin [1 ,6 ,7 ]
Zhong, Jiaming [1 ,3 ]
Li, Nannan [2 ]
Fu, Lanhui [1 ]
Jiang, Bingchun [3 ]
Yi, Ningbo [5 ]
Qi, Feng [4 ]
Xin, He [4 ]
Lam, Tin Lun [7 ]
机构
[1] Wuyi Univ, Sch Elect & Informat Engn, Jiangmen, Peoples R China
[2] Macau Univ Sci & Technol, Fac Innovat Engn, Sch Comp Sci & Engn, Macau, Peoples R China
[3] Guangdong Univ Sci & Technol, Sch Mech & Elect Engn, Dongguan, Peoples R China
[4] Wuyi Univ, Sch Appl Phys & Mat Sci, Jiangmen, Peoples R China
[5] Wuyi Univ, Sch Text Mat & Engn, Jiangmen, Peoples R China
[6] Shenzhen Vatop Semicon Tech Co Ltd, Shenzhen, Peoples R China
[7] Chinese Univ Hong Kong, Shenzhen Inst Artificial Intelligence & Robot Soc, Sch Sci & Engn, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Few-shot video classification; Multi-modal learning; Large model application; Graph Temporal Network;
D O I
10.1016/j.engappai.2024.109076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale pre-trained models and graph neural networks have recently demonstrated remarkable success in few-shot video classification tasks. However, they generally suffer from two key limitations: i) the temporal relations between adjacent frames tends to be ambiguous due to the lack of explicit temporal modeling. ii) the absence of multi-modal semantic knowledge in query videos results in inaccurate prototypes construction and an inability to achieve multi-modal temporal alignment metrics. To address these issues, we develop a Text- guided Graph Temporal Modeling (TgGTM) method that consists of two crucial components: a text-guided feature refinement module and a learnable Query text-token contrastive objective. Specifically, the former leverages the Temporal masking layer to guide the model in learning temporal relationships between adjacent frames. Additionally, it utilizes multi-modal information to refine video prototypes for comprehensive few- shot video classification. The latter addresses the feature discrepancy between multi-modal support features and single-modal query features by aligning a learnable Query text-token with corresponding base class text descriptions. Extensive experiments on four commonly used benchmarks demonstrate the effectiveness of our proposed method, which achieves mean accuracies of 54.4%, 80.3%, 91.9%, and 96.2% for 5-way 1shot classification on SSV2-Small, HMDB51, Kinetics, and UCF101, respectively. These results are superior compared to existing state-of-the-art methods. A detailed ablation showcases the importance of learning temporal relationships between adjacent frames and obtaining Query text-token. The source code and models will be publicly available at https://github.com/JiaMingZhong2621/TgGTM.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Few-shot English Text Classification Method Based On Graph Convolutional Network And Prompt Learning
    Jin, Yunfei
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2025, 28 (09): : 1777 - 1784
  • [32] DCCL: Distance-coefficient guided Clustering with Contrastive Learning for Few-shot Text Classification
    Wang, Han
    Gu, Chunhua
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 85 - 90
  • [33] Generalized Few-Shot Video Classification With Video Retrieval and Feature Generation
    Xian, Yongqin
    Korbar, Bruno
    Douze, Matthijs
    Torresani, Lorenzo
    Schiele, Bernt
    Akata, Zeynep
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8949 - 8961
  • [34] Learning Dual-Routing Capsule Graph Neural Network for Few-Shot Video Classification
    Feng, Yangbo
    Gao, Junyu
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3204 - 3216
  • [35] ContrastNet: A Contrastive Learning Framework for Few-Shot Text Classification
    Chen, Junfan
    Zhang, Richong
    Mao, Yongyi
    Xu, Jie
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10492 - 10500
  • [36] Enhanced Prompt Learning for Few-shot Text Classification Method
    Li R.
    Wei Z.
    Fan Y.
    Ye S.
    Zhang G.
    Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60 (01): : 1 - 12
  • [37] Mutual Learning Prototype Network for Few-Shot Text Classification
    Liu, Jun
    Qin, Xiaorui
    Tao, Jian
    Dong, Hongfei
    Li, Xiaoxu
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2024, 47 (03): : 30 - 35
  • [38] Boosting Few-Shot Text Classification via Distribution Estimation
    Liu, Han
    Zhang, Feng
    Zhang, Xiaotong
    Zhao, Siyang
    Ma, Fenglong
    Wu, Xiao-Ming
    Chen, Hongyang
    Yu, Hong
    Zhang, Xianchao
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13219 - 13227
  • [39] Hierarchical Attention Prototypical Networks for Few-Shot Text Classification
    Sun, Shengli
    Sun, Qingfeng
    Zhou, Kevin
    Lv, Tengchao
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 476 - 485
  • [40] Supervised Graph Contrastive Learning for Few-Shot Node Classification
    Tan, Zhen
    Ding, Kaize
    Guo, Ruocheng
    Liu, Huan
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT II, 2023, 13714 : 394 - 411