Cross-modal Multiple Granularity Interactive Fusion Network for Long Document Classification

被引:0
|
作者
Liu, Tengfei [1 ]
Hu, Yongli [1 ]
Gao, Junbin [2 ]
Sun, Yanfeng [1 ]
Yin, Baocai [1 ]
机构
[1] Beijing Univ Technol, 100,Pingleyuan, Beijing, Peoples R China
[2] Univ Sydney, Sydney, NSW, Australia
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Long document classification; multi-modal collaborative pooling; cross-modal multi-granularity interactive fusion;
D O I
10.1145/3631711
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Long Document Classification (LDC) has attracted great attention in Natural Language Processing and achieved considerable progress owing to the large-scale pre-trained language models. In spite of this, as a different problem from the traditional text classification, LDC is far from being settled. Long documents, such as news and articles, generally have more than thousands of words with complex structures. Moreover, compared with flat text, long documents usually contain multi-modal content of images, which provide rich information but not yet being utilized for classification. In this article, we propose a novel cross-modal method for long document classification, in which multiple granularity feature shifting networks are proposed to integrate the multi-scale text and visual features of long documents adaptively. Additionally, a multi-modal collaborative pooling block is proposed to eliminate redundant fine-grained text features and simultaneously reduce the computational complexity. To verify the effectiveness of the proposed model, we conduct experiments on the Food101 dataset and two constructed multi-modal long document datasets. The experimental results show that the proposed cross-modal method outperforms the single-modal text methods and defeats the state-of-the-art related multi-modal baselines.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Cross-modal attention network for retinal disease classification based on multi-modal images
    Liu, Zirong
    Hu, Yan
    Qiu, Zhongxi
    Niu, Yanyan
    Zhou, Dan
    Li, Xiaoling
    Shen, Junyong
    Jiang, Hongyang
    Li, Heng
    Liu, Jiang
    BIOMEDICAL OPTICS EXPRESS, 2024, 15 (06): : 3699 - 3714
  • [32] Cross-modal discriminant adversarial network
    Hu, Peng
    Peng, Xi
    Zhu, Hongyuan
    Lin, Jie
    Zhen, Liangli
    Wang, Wei
    Peng, Dezhong
    PATTERN RECOGNITION, 2021, 112
  • [33] Cross-modal attention fusion network for RGB-D semantic segmentation
    Zhao, Qiankun
    Wan, Yingcai
    Xu, Jiqian
    Fang, Lijin
    NEUROCOMPUTING, 2023, 548
  • [34] Dual-stream cross-modal fusion alignment network for survival analysis
    Song, Jinmiao
    Hao, Yatong
    Zhao, Shuang
    Zhang, Peng
    Feng, Qilin
    Dai, Qiguo
    Duan, Xiaodong
    BRIEFINGS IN BIOINFORMATICS, 2025, 26 (02)
  • [35] Emotional computing based on cross-modal fusion and edge network data incentive
    Ma, Lei
    Ju, Feng
    Wan, Jing
    Shen, Xiaoyan
    PERSONAL AND UBIQUITOUS COMPUTING, 2019, 23 (3-4) : 363 - 372
  • [36] DCMFNet: Deep Cross-Modal Fusion Network for Referring Image Segmentation with Iterative Gated Fusion
    Huang, Zhen
    Xue, Mingcheng
    Liu, Yu
    Xu, Kaiping
    Li, Jiangquan
    Yu, Chenyang
    PROCEEDINGS OF THE 50TH GRAPHICS INTERFACE CONFERENCE, GI 2024, 2024,
  • [37] Unsupervised Deep Fusion Cross-modal Hashing
    Huang, Jiaming
    Min, Chen
    Jing, Liping
    ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 358 - 366
  • [38] Transformer-Based Cross-Modal Information Fusion Network for Semantic Segmentation
    Duan, Zaipeng
    Huang, Xiao
    Ma, Jie
    NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6361 - 6375
  • [39] A light-weight, efficient, and general cross-modal image fusion network
    Fang, Aiqing
    Zhao, Xinbo
    Yang, Jiaqi
    Qin, Beibei
    Zhang, Yanning
    NEUROCOMPUTING, 2021, 463 : 198 - 211
  • [40] Multi-grained Cross-Modal Feature Fusion Network for Diagnosis Prediction
    An, Ying
    Zhao, Zhenrui
    Chen, Xianlai
    BIOINFORMATICS RESEARCH AND APPLICATIONS, PT II, ISBRA 2024, 2024, 14955 : 221 - 232