Cross-modal Multiple Granularity Interactive Fusion Network for Long Document Classification

被引:0
|
作者
Liu, Tengfei [1 ]
Hu, Yongli [1 ]
Gao, Junbin [2 ]
Sun, Yanfeng [1 ]
Yin, Baocai [1 ]
机构
[1] Beijing Univ Technol, 100,Pingleyuan, Beijing, Peoples R China
[2] Univ Sydney, Sydney, NSW, Australia
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Long document classification; multi-modal collaborative pooling; cross-modal multi-granularity interactive fusion;
D O I
10.1145/3631711
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Long Document Classification (LDC) has attracted great attention in Natural Language Processing and achieved considerable progress owing to the large-scale pre-trained language models. In spite of this, as a different problem from the traditional text classification, LDC is far from being settled. Long documents, such as news and articles, generally have more than thousands of words with complex structures. Moreover, compared with flat text, long documents usually contain multi-modal content of images, which provide rich information but not yet being utilized for classification. In this article, we propose a novel cross-modal method for long document classification, in which multiple granularity feature shifting networks are proposed to integrate the multi-scale text and visual features of long documents adaptively. Additionally, a multi-modal collaborative pooling block is proposed to eliminate redundant fine-grained text features and simultaneously reduce the computational complexity. To verify the effectiveness of the proposed model, we conduct experiments on the Food101 dataset and two constructed multi-modal long document datasets. The experimental results show that the proposed cross-modal method outperforms the single-modal text methods and defeats the state-of-the-art related multi-modal baselines.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval
    Shu, Xinsheng
    Li, Mingyong
    WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 146 - 161
  • [22] CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network
    Peng, Yuxin
    Qi, Jinwei
    Huang, Xin
    Yuan, Yuxin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (02) : 405 - 420
  • [23] PCFN: Progressive Cross-Modal Fusion Network for Human Pose Transfer
    Yu, Wei
    Li, Yanping
    Wang, Rui
    Cao, Wenming
    Xiang, Wei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) : 3369 - 3382
  • [24] CMFN: Cross-Modal Fusion Network for Irregular Scene Text Recognition
    Zheng, Jinzhi
    Ji, Ruyi
    Zhang, Libo
    Wu, Yanjun
    Zhao, Chen
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT VI, 2024, 14452 : 421 - 433
  • [25] Multi-Level Cross-Modal Interactive-Network-Based Semi-Supervised Multi-Modal Ship Classification
    Song, Xin
    Chen, Zhikui
    Zhong, Fangming
    Gao, Jing
    Zhang, Jianning
    Li, Peng
    SENSORS, 2024, 24 (22)
  • [26] Object Classification in SAR Imagery With Deep Cross-Modal Transfer Network
    Li, Xue
    Wu, Yuan
    Lai, Zuomei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [27] Cross-modal fusion for multi-label image classification with attention mechanism
    Wang, Yangtao
    Xie, Yanzhao
    Zeng, Jiangfeng
    Wang, Hanpin
    Fan, Lisheng
    Song, Yufan
    Computers and Electrical Engineering, 2022, 101
  • [28] Cross-modal fusion for multi-label image classification with attention mechanism
    Wang, Yangtao
    Xie, Yanzhao
    Zeng, Jiangfeng
    Wang, Hanpin
    Fan, Lisheng
    Song, Yufan
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
  • [29] A Bidirectional Separated Distillation-Based Cross-Modal Interactive Fusion Network for Skeleton-Based Action Recognition
    Wang, Mingdao
    Zhang, Xianlin
    Chen, Siqi
    Li, Xueming
    Zhang, Yue
    IEEE SENSORS JOURNAL, 2025, 25 (01) : 1814 - 1824