Cross-modal Multiple Granularity Interactive Fusion Network for Long Document Classification

被引:0
|
作者
Liu, Tengfei [1 ]
Hu, Yongli [1 ]
Gao, Junbin [2 ]
Sun, Yanfeng [1 ]
Yin, Baocai [1 ]
机构
[1] Beijing Univ Technol, 100,Pingleyuan, Beijing, Peoples R China
[2] Univ Sydney, Sydney, NSW, Australia
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Long document classification; multi-modal collaborative pooling; cross-modal multi-granularity interactive fusion;
D O I
10.1145/3631711
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Long Document Classification (LDC) has attracted great attention in Natural Language Processing and achieved considerable progress owing to the large-scale pre-trained language models. In spite of this, as a different problem from the traditional text classification, LDC is far from being settled. Long documents, such as news and articles, generally have more than thousands of words with complex structures. Moreover, compared with flat text, long documents usually contain multi-modal content of images, which provide rich information but not yet being utilized for classification. In this article, we propose a novel cross-modal method for long document classification, in which multiple granularity feature shifting networks are proposed to integrate the multi-scale text and visual features of long documents adaptively. Additionally, a multi-modal collaborative pooling block is proposed to eliminate redundant fine-grained text features and simultaneously reduce the computational complexity. To verify the effectiveness of the proposed model, we conduct experiments on the Food101 dataset and two constructed multi-modal long document datasets. The experimental results show that the proposed cross-modal method outperforms the single-modal text methods and defeats the state-of-the-art related multi-modal baselines.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] CMFX: Cross-modal fusion network for RGB-X crowd counting
    Duan, Xiao-Meng
    Sun, Hong-Mei
    Zhang, Zeng-Min
    Qin, Ling-Xiao
    Jia, Rui-Sheng
    NEURAL NETWORKS, 2025, 184
  • [42] Transformer-Based Cross-Modal Information Fusion Network for Semantic Segmentation
    Zaipeng Duan
    Xiao Huang
    Jie Ma
    Neural Processing Letters, 2023, 55 : 6361 - 6375
  • [43] Adversarial Tri-Fusion Hashing Network for Imbalanced Cross-Modal Retrieval
    Liu, Xin
    Cheung, Yiu-ming
    Hu, Zhikai
    He, Yi
    Zhong, Bineng
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2021, 5 (04): : 607 - 619
  • [44] Emotional computing based on cross-modal fusion and edge network data incentive
    Lei Ma
    Feng Ju
    Jing Wan
    Xiaoyan Shen
    Personal and Ubiquitous Computing, 2019, 23 : 363 - 372
  • [45] Enhancing Stock Price Prediction with Deep Cross-Modal Information Fusion Network
    Mandal, Rabi Chandra
    Kler, Rajnish
    Tiwari, Anil
    Keshta, Ismail
    Abonazel, Mohamed R.
    Tageldin, Elsayed M.
    Umaralievich, Mekhmonov Sultonali
    FLUCTUATION AND NOISE LETTERS, 2024, 23 (02):
  • [46] Attentive Cross-Modal Fusion Network for RGB-D Saliency Detection
    Liu, Di
    Zhang, Kao
    Chen, Zhenzhong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 967 - 981
  • [47] Mixed-scale cross-modal fusion network for referring image segmentation
    Pan, Xiong
    Xie, Xuemei
    Yang, Jianxiu
    NEUROCOMPUTING, 2025, 614
  • [48] CCAFusion: Cross-Modal Coordinate Attention Network for Infrared and Visible Image Fusion
    Li, Xiaoling
    Li, Yanfeng
    Chen, Houjin
    Peng, Yahui
    Pan, Pan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 866 - 881
  • [49] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    Hao, Fei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [50] Multi-hop Interactive Cross-Modal Retrieval
    Ning, Xuecheng
    Yang, Xiaoshan
    Xu, Changsheng
    MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 : 681 - 693